You are viewing docs on Elastic's new documentation system, currently in technical preview. For all other Elastic docs, visit elastic.co/guide.

Airflow

Airflow Integration.

Beta feature

This functionality is in beta and is subject to change. The design and code is less mature than official generally available features and is being provided as-is with no warranties. Beta features are not subject to the support service level agreement of official generally available features.

What is an Elastic integration?

This integration is powered by Elastic Agent. Elastic Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to a host. It can also protect hosts from security threats, query data from operating systems, forward data from remote services or hardware, and more. Refer to our documentation for a detailed comparison between Beats and Elastic Agent.

Prefer to use Beats for this use case? See Filebeat modules for logs or Metricbeat modules for metrics.

Airflow is a platform to programmatically author, schedule and monitor workflows. Airflow is used to author workflows Directed Acyclic Graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. This integration collects metrics from Airflow running a StatsD server where airflow will send metrics to. The default datastream is StatsD.

Compatibility

The Airflow module is tested with Airflow 2.4.0. It should work with version 2.0.0 and later.

StatsD

StatsD datastream retrieves the Airflow metrics using StatsD server. The Airflow integration requires StatsD to receive StatsD metrics. Refer to the link for more details about StatsD.

Add the following lines to your Airflow configuration file e.g. airflow.cfg ensuring statsd_prefix is left empty and replace %HOST% with the address agent is running:

[metrics]
statsd_on = True
statsd_host = %HOST%
statsd_port = 8125
statsd_prefix =

Exported fields

FieldDescriptionTypeMetric Type
@timestamp
Event timestamp.
date
agent.id
keyword
airflow.*.count
Airflow counters
object
counter
airflow.*.max
Airflow max timers metric
object
airflow.*.mean
Airflow mean timers metric
object
airflow.*.mean_rate
Airflow mean rate timers metric
object
airflow.*.median
Airflow median timers metric
object
airflow.*.min
Airflow min timers metric
object
airflow.*.stddev
Airflow standard deviation timers metric
object
airflow.*.value
Airflow gauges
object
gauge
airflow.dag_file
Airflow dag file metadata
keyword
airflow.dag_id
Airflow dag id metadata
keyword
airflow.job_name
Airflow job name metadata
keyword
airflow.operator_name
Airflow operator name metadata
keyword
airflow.pool_name
Airflow pool name metadata
keyword
airflow.scheduler_heartbeat.count
Airflow scheduler heartbeat
double
airflow.status
Airflow status metadata
keyword
airflow.task_id
Airflow task id metadata
keyword
cloud.account.id
The cloud account or organization id used to identify different entities in a multi-tenant environment. Examples: AWS account id, Google Cloud ORG Id, or other unique identifier.
keyword
cloud.availability_zone
Availability zone in which this host is running.
keyword
cloud.image.id
Image ID for the cloud instance.
keyword
cloud.instance.id
Instance ID of the host machine.
keyword
cloud.instance.name
Instance name of the host machine.
keyword
cloud.machine.type
Machine type of the host machine.
keyword
cloud.project.id
Name of the project in Google Cloud.
keyword
cloud.provider
Name of the cloud provider. Example values are aws, azure, gcp, or digitalocean.
keyword
cloud.region
Region in which this host is running.
keyword
container.id
Unique container id.
keyword
container.image.name
Name of the image the container was built on.
keyword
container.labels
Image labels.
object
container.name
Container name.
keyword
container.runtime
Runtime managing this container.
keyword
data_stream.dataset
Data stream dataset.
constant_keyword
data_stream.namespace
Data stream namespace.
constant_keyword
data_stream.type
Data stream type.
constant_keyword
ecs.version
ECS version this event conforms to. ecs.version is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.
keyword
event.dataset
Event dataset
constant_keyword
event.module
Event module
constant_keyword
host
A host is defined as a general computing instance. ECS host.* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.
group
host.architecture
Operating system architecture.
keyword
host.containerized
If the host is a container.
boolean
host.domain
Name of the domain of which the host is a member. For example, on Windows this could be the host's Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host's LDAP provider.
keyword
host.hostname
Hostname of the host. It normally contains what the hostname command returns on the host machine.
keyword
host.id
Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of beat.name.
keyword
host.ip
Host ip addresses.
ip
host.mac
Host mac addresses.
keyword
host.name
Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use.
keyword
host.os.build
OS build information.
keyword
host.os.codename
OS codename, if any.
keyword
host.os.family
OS family (such as redhat, debian, freebsd, windows).
keyword
host.os.kernel
Operating system kernel version as a raw string.
keyword
host.os.name
Operating system name, without the version.
keyword
host.os.name.text
Multi-field of host.os.name.
text
host.os.platform
Operating system platform (such centos, ubuntu, windows).
keyword
host.os.version
Operating system version as a raw string.
keyword
host.type
Type of host. For Cloud providers this can be the machine type like t2.medium. If vm, this could be the container, for example, or other information meaningful in your environment.
keyword
service.address
Service address
keyword
service.type
The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, service.type would be elasticsearch.
keyword

Changelog

VersionDetails
0.5.1
Bug fix View pull request
Add dimension field for container.id which was previously missed during package-spec v3 migration
0.5.0
Enhancement View pull request
Update the package format_version to 3.0.0.
0.4.0
Enhancement View pull request
Enable time series data streams for the metrics datasets. This dramatically reduces storage for metrics and is expected to progressively improve query performance. For more details, see https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html.
0.3.1
Bug fix View pull request
Remove metric_type mapping for 'airflow.scheduler.heartbeat' field and adjust the dashboard to visualize this field using 'last_value'.
0.3.0
Enhancement View pull request
Revert metrics field definition to the format used before introducing metric_type.
0.2.0
Enhancement View pull request
Add metric_type mapping for the fields of statsd datastream.
0.1.0
Enhancement View pull request
Rename ownership from obs-service-integrations to obs-infraobs-integrations
0.0.5
Bug fix View pull request
Modifed the dimension field mapping to support public cloud deployment.
0.0.4
Enhancement View pull request
Added dimensions fields to enable TSDB.
0.0.3
Enhancement View pull request
Added categories and/or subcategories.
0.0.2
Enhancement View pull request
add dashboards
0.0.1
Enhancement View pull request
initial release

On this page