Skip to main content
Skip table of contents

Datadog Observability Integration

ULTIMATE

Feature Preview: This feature is available for Ultimate subscribers as a subscription add-on and may incur additional charges.

The Datadog Observability Integration streams logs and metrics from all apps into your Datadog account for seamless observability.

Offered as an add-on feature, this backend integration enables monitoring from external systems, faster troubleshooting, and unified visibility across your application ecosystem.

Key Capabilities

1. Platform-Level Integration

  • Available as an optional add-on at the platform level

  • Automatically captures and streams monitoring data of all deployed apps

2. Application Log Streaming

  • Streams application logs in near real-time to Datadog

  • Includes records generated at the application layer for debugging and troubleshooting, including:

    • Log output of apps ranging from various levels, including errors, warnings, info and audit logs (optional).

    • Run output of apps that represent the execution sequence of a flow (optional).

    • Job events of apps generated by checkpoints of a flow (optional).

  • Enables:

    • Centralized log aggregation

    • Advanced search, filtering, and analytics

    • Faster root cause analysis

→ Datadog log management allows teams to collect, process, and analyze large volumes of log data efficiently.

3. Application Metrics Streaming

  • Streams key runtime performance metrics, including:

    • CPU utilization of app

    • Memory usage of app

  • Provides continuous visibility into system health and performance

→ Metrics in Datadog are time-series data points used to monitor system health and performance over time

How It Works

  1. Apps generate logs and metrics

  2. qibb Platform automatically collects telemetry data

  3. Data is streamed to Datadog ingestion endpoints

  4. Datadog processes and visualizes the data in dashboards

This integration provides telemetry streaming and example dashboards for Datadog.

Creation and management of dashboards, monitors, alerts, retention policies, and Datadog resources remain the responsibility of the customer.

Datadog Dashboard

The following example dashboard is available for import into Datadog. It provides an overview of app logs for the selected time range. It summarizes total log volume, involved apps, runs, and jobs, and breaks logs down by space, app, container, node, and log level. It also includes a log volume timeline and a recent logs table to help users quickly spot activity spikes and inspect the latest log messages.

Datadog-qibb-app-logs-example-dashboard.png

Log Data Fields

Below are the most common data fields included in the records. Most fields are optional. Data fields may vary by app or platform version.

Category

Field

Description

Example

Time

timestamp

Timestamp of the record.

1779111812501

@timestamp

Formatted timestamp

2026-05-18T13:43:32.502Z

Log type

type

If empty: Log record.

If “run”: Run record

If “job_event”: Event record for a job.

-

Log Information

level

Log level, e.g. info, debug, audit, warn, error

info

stream

Log output stream, e.g. stdout or stderr

stdout

message

The log message

CODE
The next execution time of Auto Retry task is scheduled for **********.

App Information

qibb_appId

Id of the app.

xmtwhy*******

container

The component of the app. Either flow-app-container (primary component that powers the low-code workflow engine which runs flow) or flow-app-sidecar (secondary component whic is responsible Management API processing and background tasks for jobs).

flow-app-container

qibb_spaceId

Id of the space which contains this app.

of1****

Flow Information

flow_id

Id of the flow tab.

abc*************

Node Information

node_id

If of the node.

xy5df3**********

node_type

Node type

qibb-checkpoint

node_name

Node name

Checkpoint

Flow Message Information (msg)

msg_id

Id of the processed msg object.

71ccf2bc59******

Run Information

run_id

Id of the run.

yy77WVc1mNJFbu*******

Job Information

job_id

If of the job.

a3rL4ptjpDCVj0v******

msg

Only applicable if type=job_event.

A JSON containing event metadata. Typically includes:
job_id, event, checkpoint_id, checkpoint_name, checkpoint_type, summary_plain_text

CODE
{
"checkpoint_name":"Wait for approval",
"queue_type":"WAIT",
"job_id":"eh74L84596Vt3eG*****",
"checkpoint_id":"ae5df382911*****",
"checkpoint_type":"WAIT",
"event_level":"INFO",
"summary_plain_text":"Wait for approval: Job awaits approval.",
"type":"job_event",
"event":"WAIT",
"attempt":0,
"timestamp":"2026-05-18T13:43:32.501Z"
}

Infrastructure

qibb_cluster_id

Id of the cluster which is hosting this app.

clx**********

Metrics

The following metrics is the standard set which can be provided to monitor apps.

Metric names may change in future releases. Exact names vary by app and platform version.

Category

Metric

Availability

Description

Resource usage

container_cpu_usage_seconds_total

Default

Total CPU time consumed by the app container. Can be used to monitor CPU utilization over time.

Resource usage

container_memory_working_set_bytes

Default

Current memory working set of the app container. Can be used to monitor memory consumption and detect memory pressure.

Resource limits

kube_pod_container_resource_limits{resource="cpu"}

Optional

Configured CPU limit of the app container. Helps compare CPU usage against the configured app limit.

Resource limits

kube_pod_container_resource_limits{resource="memory"}

Optional

Configured memory limit of the app container. Helps compare memory usage against the configured app limit.

Lifecycle

kube_pod_container_status_restarts_total

Optional

Number of app container restarts. Indicates unstable apps, crash loops, or repeated failures. Note: Users may also restart containers during administrative actions like app restarts or upgrades.

API / Network usage

envoy_cluster_<app>_external_upstream_rq_completed

Default

Total number of completed requests routed to the app through the qibb API gateway.

API / Network usage

envoy_cluster_<app>_external_upstream_rq_2xx

Default

Number of successful responses returned by the app.

API / Network usage

envoy_cluster_<app>_external_upstream_rq_4xx

Default

Number of client error responses, for example invalid requests, unauthorized requests, or missing resources.

API / Network usage

envoy_cluster_<app>_external_upstream_rq_5xx

Default

Number of server error responses, typically indicating application or upstream failures.

API / Network usage

envoy_cluster_<app>_external_upstream_rq_3xx

Optional

Number of redirect responses returned by the app.

API / Network usage

envoy_cluster_<app>_external_upstream_rq_time

Optional

Request processing duration for requests routed to the app. Can be used to monitor latency percentiles where available.

  • Metrics are available through a curated allowlist managed by qibb’s DevSecOps team for SaaS deployments. Additional metrics can be exposed on request for PaaS deployments, depending on availability.

  • Tags provide customer-safe attribution and enable filtering or aggregation by app, space, and cluster level, as well as breakdown of the app components (pod or container).

  • Optional metrics may increase Datadog custom metric usage depending on the number of emitted tag value combinations.

Datadog Usage Estimate

Datadog usage varies primary by the number of qibb apps, and by executed runs and jobs. The data below should give an estimate of a typical environment:

  • 1 Infrastructure Host per qibb cluster (representing the data service that collects and pushes the data to Datadog)

  • 10-35+ Custom Metrics per qibb app (depending on the enabled metrics, emitted tags, and Datadog configuration)

  • 1M+ Indexed Logs per 100 qibb apps per day.

Note that all Datadog costs are billed directly by Datadog and are the customer's responsibility.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.