Public inference observability

Gonka Power

Track latency, reliability, and the real-world behavior of decentralized AI models from different providers in Gonka.

Providers 0 Observed gateways
Available models 0 Unique models across gateways
Median API uptime n/a Across last 24 hours
Minimum price n/a per 1 million tokens

Gonka Provider Overview

Compact scorecards for uptime, latency, context handling, throughput, and price.

Recent Incidents

Latest incident windows opened by health probe failures.

Methodology and Probe Tiers
  • One probe environment, so latency is comparable here but not globally representative.
  • Health status changes after repeated failures or recoveries to reduce false alarms.
  • Capability checks are shown separately from API uptime so feature gaps do not distort availability.
  • Preview cards use estimated all-in spend, while provider detail cards show token pricing snapshots.
  • API uptime uses core health probes only: basic chat and streaming health.
  • Performance reliability tracks short latency, TTFT, throughput, and stability separately from uptime.
  • Context reliability tracks 2k, 8k, and 32k pass rates separately from core API health.
  • Real-world gen combines practical reply, long-stream, and heavy-report generation probes to show whether a provider can sustain real working completions.
  • Reply generation is a medium-length practical answer probe, Long stream is a longer streaming completion probe, and Heavy report is a long non-stream structured report probe.
  • 32k is treated as a rare stress tier, and 64k remains calibration-only.
  • Temporarily unavailable means a provider hit rate limits or routing pressure during a probe, not necessarily a missing feature.
  • API uptime goes down when a core health probe fails. That includes no model response, HTTP 4xx/5xx errors, timeouts, failed validation, or protocol-level failures.
  • Latency shows the median full-response time on the standard short non-stream probe over the last 24 hours. The p95 line shows the slower tail, not an average.
  • Output speed is the median effective completion speed on the standard medium-generation throughput probe. It reflects both generation speed and request overhead.
  • Context reliability is the average success rate across the 2k, 8k, and 32k context probes. The percentage drops when the provider returns an error, times out, fails validation, or does not complete the request correctly.
  • Real-world gen falls when any practical generation probe fails due to timeouts, 5xx responses, truncated output, or missing required topics.
  • Heavy report is primarily a completion test: it checks whether the provider can finish a long working report, not just return a fast short answer.
  • Real spend / 1M estimates all-in USD spend per 1M tokens, including configured cash-in fees. For direct GNK wallet billing, the dashboard normalizes GNK-denominated pricing into USD via the live GNK / USDT midpoint. Provider detail cards keep a separate Token price / 1M (API) metric. View GNK price chart.
  • Provider card ranking uses a weighted score: Failed probes 30%, API uptime (24h) 30%, Real spend / 1M 20%, Output speed 10%, and Heavy report 10%.