Monitoring¶

Every product exposes health and metrics endpoints for monitoring.

Health Endpoints¶

Endpoint	Purpose	Use for
`/health`	Full health check with component status	Dashboards, alerting
`/health/live`	Liveness probe (is the process running?)	Kubernetes liveness probe
`/health/ready`	Readiness probe (can it serve traffic?)	Kubernetes readiness probe, load balancers

# Full health check
curl http://localhost:8100/health
# {"status":"healthy","product":"haagsman-document-search","version":"1.0.0","uptime_seconds":3600,"checks":{"vectorstore":"ok"}}

Prometheus Metrics¶

Every product exposes metrics at /metrics in Prometheus format:

curl http://localhost:8100/metrics

Key metrics¶

Metric	Type	Description
`haagsman_http_requests_total`	Counter	Total HTTP requests (by method, endpoint, status)
`haagsman_http_latency_seconds`	Histogram	Request latency (by method, endpoint)
`haagsman_llm_requests_total`	Counter	LLM API calls (by provider, model, status)
`haagsman_llm_latency_seconds`	Histogram	LLM response time
`haagsman_llm_tokens_total`	Counter	Token usage (by provider, direction)
`haagsman_documents_indexed`	Gauge	Documents in vector store

Grafana integration¶

Add Prometheus as a data source in Grafana, then import or build dashboards using the metrics above. Useful panels:

Request rate and error rate over time
LLM latency p50/p95/p99
Token usage and cost estimation
Active documents indexed

Docker health checks¶

All containers include built-in Docker health checks:

# View container health
docker inspect hai-document-search --format='{{.State.Health.Status}}'
# healthy

# View recent health check results
docker inspect hai-document-search --format='{{json .State.Health}}' | jq

Alerting recommendations¶

Condition	Severity	Action
`/health` returns non-200	Critical	Investigate immediately
LLM error rate > 5%	Warning	Check API key, provider status
p99 latency > 10s	Warning	Check LLM provider, consider scaling
Disk usage > 80%	Warning	Archive old data, increase storage

Logging¶

All products log to stdout in structured format:

2026-04-08T09:30:00 [INFO] haagsman: POST /api/v1/search/query 200 1.243s

View logs:

docker compose logs -f document-search
docker compose logs --since 1h email-triage

Configure log level via HAAGSMAN_LOG_LEVEL (DEBUG, INFO, WARNING, ERROR).