Skip to content

Monitoring

Every product exposes health and metrics endpoints for monitoring.

Health Endpoints

Endpoint Purpose Use for
/health Full health check with component status Dashboards, alerting
/health/live Liveness probe (is the process running?) Kubernetes liveness probe
/health/ready Readiness probe (can it serve traffic?) Kubernetes readiness probe, load balancers
# Full health check
curl http://localhost:8100/health
# {"status":"healthy","product":"haagsman-document-search","version":"1.0.0","uptime_seconds":3600,"checks":{"vectorstore":"ok"}}

Prometheus Metrics

Every product exposes metrics at /metrics in Prometheus format:

curl http://localhost:8100/metrics

Key metrics

Metric Type Description
haagsman_http_requests_total Counter Total HTTP requests (by method, endpoint, status)
haagsman_http_latency_seconds Histogram Request latency (by method, endpoint)
haagsman_llm_requests_total Counter LLM API calls (by provider, model, status)
haagsman_llm_latency_seconds Histogram LLM response time
haagsman_llm_tokens_total Counter Token usage (by provider, direction)
haagsman_documents_indexed Gauge Documents in vector store

Grafana integration

Add Prometheus as a data source in Grafana, then import or build dashboards using the metrics above. Useful panels:

  • Request rate and error rate over time
  • LLM latency p50/p95/p99
  • Token usage and cost estimation
  • Active documents indexed

Docker health checks

All containers include built-in Docker health checks:

# View container health
docker inspect hai-document-search --format='{{.State.Health.Status}}'
# healthy

# View recent health check results
docker inspect hai-document-search --format='{{json .State.Health}}' | jq

Alerting recommendations

Condition Severity Action
/health returns non-200 Critical Investigate immediately
LLM error rate > 5% Warning Check API key, provider status
p99 latency > 10s Warning Check LLM provider, consider scaling
Disk usage > 80% Warning Archive old data, increase storage

Logging

All products log to stdout in structured format:

2026-04-08T09:30:00 [INFO] haagsman: POST /api/v1/search/query 200 1.243s

View logs:

docker compose logs -f document-search
docker compose logs --since 1h email-triage

Configure log level via HAAGSMAN_LOG_LEVEL (DEBUG, INFO, WARNING, ERROR).