---
title: "Observability"
description: "Prometheus metrics, status fields, and monitoring the Remediator Agent in production."
diataxis: how-to
applies_to:
  product: "nirmata-ai-agents"
audience: ["platform-engineer"]
last_updated: 2026-04-16
url: https://docs.nirmata.io/docs/control-hub/agent-hub/service-agents/observability/
---


## Prometheus Metrics

The Remediator Agent exposes Prometheus metrics at the controller manager's metrics endpoint.

### Available Metrics

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `remediator_reconciles_total` | Counter | `result="success\|error"` | Total number of reconciliation runs |
| `remediator_reconcile_duration_seconds` | Histogram | `result="success\|error"` | Duration of each reconciliation run |

### Enable ServiceMonitor

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: go-agent-remediator-metrics
  namespace: go-agent-remediator-system
spec:
  selector:
    matchLabels:
      control-plane: controller-manager
  endpoints:
    - port: https
      path: /metrics
      scheme: https
      tlsConfig:
        insecureSkipVerify: true
```

### Access Metrics Directly

```bash
kubectl -n go-agent-remediator-system port-forward \
  deploy/go-agent-remediator-controller-manager 8443:8443

SA=go-agent-remediator-controller-manager
NS=go-agent-remediator-system
TOKEN=$(kubectl -n $NS create token $SA)
curl -k -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics
```

### Example Queries

```promql
# Success rate over the last hour
sum(rate(remediator_reconciles_total{result="success"}[1h]))
/ sum(rate(remediator_reconciles_total[1h]))

# P95 reconciliation latency
histogram_quantile(0.95,
  sum by (le) (rate(remediator_reconcile_duration_seconds_bucket[1h]))
)
```yaml

---

## Remediator Status

The `Remediator` resource reports detailed status about each run.

```bash
# View full status
kubectl get remediator remediator-argo-hub -n nirmata -o yaml

# View just the last run summary
kubectl get remediator remediator-argo-hub -n nirmata \
  -o jsonpath='{.status.lastRunSummary}' | jq
```

### Status Fields

| Field | Description |
|-------|-------------|
| `phase` | Current operational phase: `Running`, `Idle`, or `Failed` |
| `lastScheduleTime` | When the last remediation was scheduled |
| `lastSuccessfulTime` | When the last successful run completed |
| `nextScheduledTime` | When the next run is scheduled |
| `conditions` | Step-by-step workflow tracking with collector information |
| `lastRunSummary.startTime` / `endTime` | Run duration timestamps |
| `lastRunSummary.status` | Success or failure |
| `lastRunSummary.message` | Human-readable outcome |
| `lastRunSummary.targetsProcessed` | Number of targets scanned |
| `lastRunSummary.violationsFound` | Total violations discovered |
| `lastRunSummary.remediationPlans` | Number of AI-generated plans produced |
| `lastRunSummary.actionsExecuted` | Number of actions taken (PRs created, etc.) |
| `lastRunSummary.errors` | Any errors encountered |

### Example Status Query

```bash
kubectl get remediator remediator-argo-hub -n nirmata \
  -o jsonpath='{.status.lastRunSummary}' | jq '{
  status: .status,
  violations: .violationsFound,
  plans: .remediationPlans,
  actions: .actionsExecuted,
  errors: .errors
}'
```bash

---

## Logs

```bash
# Follow live logs
kubectl logs -n nirmata -l app.kubernetes.io/name=nirmata-agent -f

# Last 100 lines
kubectl logs -n nirmata -l app.kubernetes.io/name=nirmata-agent --tail=100
```yaml

---

## Support Matrix

| Component | Supported |
|-----------|-----------|
| **Kubernetes** | All CNCF-compliant distributions v1.20+, including on-prem |
| **AI providers** | Nirmata AI (default), AWS Bedrock, Azure OpenAI |
| **GitOps** | ArgoCD |
| **VCS** | GitHub (App & PAT), GitLab (Enterprise & SaaS) |
| **Manifests** | YAML files, simple Helm charts |


