Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.andromeda.ai/llms.txt

Use this file to discover all available pages before exploring further.

Andromeda can push scoped metrics directly to Grafana Cloud, VictoriaMetrics, or any Prometheus-compatible remote-write endpoint.
Grafana Explore view with PromQL queries and time-series output.

What gets pushed

The feed is filtered to your assigned nodes and namespaces. Typically this includes:
  • All GPU metrics (DCGM_FI_*)
  • Container CPU and memory
  • Slurm job allocation metrics
  • Node metrics for assigned nodes
The exact metric set is configured for your environment.

Set up remote write

Share endpoint URLs and authentication details only through the approved secure support channel. Do not paste credentials into docs, screenshots, tickets, or chat messages unless Andromeda Support explicitly instructs you to use that channel.
  1. Share the remote-write endpoint URL and authentication method securely with Andromeda Support.
  2. Specify which metric categories are needed: GPU, container, Slurm, node, or all.
  3. Andromeda configures and deploys the integration.
  4. Metrics begin flowing within minutes.
No changes are required to your cluster workloads.

Expectations

Remote-write volume and downstream cost depend on the selected metric categories. Estimate ingest rate before enabling broad GPU, container, Slurm, and node metric sets.
  • Latency: ~15-30 second cadence, matching scrape intervals.
  • Volume: Roughly 1-5K samples/sec per 8-GPU node depending on metric selection. Verify the endpoint can handle the ingest rate.
  • Labels: Metrics arrive with original labels (cluster, namespace, node, pod, gpu, etc.). Relabeling can be applied on the receiving side.
  • Cost: If the endpoint charges per series or per sample, estimate from the Metrics Reference. GPU metrics alone produce ~18 series x 8 GPUs = 144 series per node.