Grafana — Dashboards et visualisation
Grafana en Docker : datasources, dashboards, alerting, provisioning, panels PromQL et Loki
Déploiement Docker
# docker-compose.ymlservices:grafana:image: grafana/grafana:10.4.2ports: ["3000:3000"]environment:GF_SECURITY_ADMIN_USER: adminGF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_PASSWORD}"GF_USERS_ALLOW_SIGN_UP: "false"GF_SERVER_ROOT_URL: "http://localhost:3000"GF_SMTP_ENABLED: "true"GF_SMTP_HOST: "smtp.example.com:587"GF_SMTP_FROM_ADDRESS: "grafana@example.com"volumes:- grafana_data:/var/lib/grafana- ./grafana/provisioning:/etc/grafana/provisioning- ./grafana/dashboards:/var/lib/grafana/dashboardsloki:image: grafana/loki:3.0.0ports: ["3100:3100"]volumes:- ./grafana/loki-config.yml:/etc/loki/local-config.yaml- loki_data:/lokicommand: -config.file=/etc/loki/local-config.yamlpromtail:image: grafana/promtail:3.0.0volumes:- /var/log:/var/log:ro- /var/lib/docker/containers:/var/lib/docker/containers:ro- ./grafana/promtail-config.yml:/etc/promtail/config.ymlcommand: -config.file=/etc/promtail/config.ymlvolumes:grafana_data:loki_data:
docker compose up -d grafana loki promtail# Interface : http://localhost:3000 (admin / voir .env)
Provisioning — Datasources
# grafana/provisioning/datasources/datasources.ymlapiVersion: 1datasources:- name: Prometheustype: prometheusaccess: proxyurl: http://prometheus:9090isDefault: truejsonData:timeInterval: "15s"- name: Lokitype: lokiaccess: proxyurl: http://loki:3100jsonData:maxLines: 1000- name: Elasticsearchtype: elasticsearchaccess: proxyurl: http://elasticsearch:9200basicAuth: truebasicAuthUser: elasticsecureJsonData:basicAuthPassword: "${ELASTIC_PASSWORD}"jsonData:index: "logstash-*"timeField: "@timestamp"esVersion: "8.0.0"
Provisioning — Dashboards
# grafana/provisioning/dashboards/dashboards.ymlapiVersion: 1providers:- name: defaultfolder: Infratype: fileoptions:path: /var/lib/grafana/dashboardsfoldersFromFilesStructure: true
Loki — Configuration
# grafana/loki-config.ymlauth_enabled: falseserver:http_listen_port: 3100ingester:lifecycler:ring:kvstore:store: inmemoryreplication_factor: 1schema_config:configs:- from: 2024-01-01store: boltdb-shipperobject_store: filesystemschema: v11index:prefix: index_period: 24hstorage_config:boltdb_shipper:active_index_directory: /loki/indexcache_location: /loki/index_cachefilesystem:directory: /loki/chunkslimits_config:retention_period: 30dingestion_rate_mb: 16max_query_series: 5000
Promtail — Configuration
# grafana/promtail-config.ymlserver:http_listen_port: 9080positions:filename: /tmp/positions.yamlclients:- url: http://loki:3100/loki/api/v1/pushscrape_configs:- job_name: systemstatic_configs:- targets: [localhost]labels:job: syslog__path__: /var/log/syslog- job_name: authstatic_configs:- targets: [localhost]labels:job: auth__path__: /var/log/auth.log- job_name: dockerdocker_sd_configs:- host: unix:///var/run/docker.sockrefresh_interval: 5srelabel_configs:- source_labels: [__meta_docker_container_name]target_label: container- source_labels: [__meta_docker_container_log_stream]target_label: stream
Panels PromQL courants
# Gauge CPU (%)100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)# Time series — trafic réseaurate(node_network_receive_bytes_total{device!="lo"}[5m]) * 8 / 1024 / 1024# Stat — uptime serveur(node_time_seconds - node_boot_time_seconds) / 3600 / 24# Bar gauge — espace disque par volume(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100# Table — containers actifs avec RAMcontainer_memory_usage_bytes{name!=""} / 1024 / 1024
LogQL — Requêtes Loki
# Tous les logs d'erreur SSH{job="auth"} |= "Failed password"# Logs par container{container="nginx"} |= "error"# Parser les logs Nginx (pattern){job="nginx"} | pattern `<ip> - - [<date>] "<method> <uri> <proto>" <status> <size>`| status >= 500# Compter les erreurs 5xx par minutesum(rate({job="nginx"} | pattern `<_> <status> <_>` | status >= 500 [1m]))# Filtrer par niveau de log{container="app"} | json | level="error"
Alerting Grafana (Unified Alerting)
# Via API — créer un contact point Slackcurl -X POST http//adminpasswordlocalhost3000/api/v1/provisioning/contact-points-H "Content-Type: application/json"-d"name" "slack-ops""type" "slack""settings""url" "<SLACK_WEBHOOK_URL>""channel" "#alerts"# Créer une règle d alerte via APIcurl -X POST http//adminpasswordlocalhost3000/api/v1/provisioning/alert-rules-H "Content-Type: application/json"-d"title" "CPU High""condition" "C""data""refId" "A""datasourceUid" "__expr__""model""expr" "100 - (avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100) > 85""for" "5m""labels" "severity" "warning""annotations" "summary" "CPU élevé""folderUID" "infra""ruleGroup" "infra-alerts"
Commandes utiles
# Vérifier la santé de Grafanacurl -s http//localhost3000/api/health | jq# Lister les datasourcescurl -s http//adminpasswordlocalhost3000/api/datasources | jq '.[].name'# Exporter un dashboard (récupérer l'UID depuis l'URL)curl -s http//adminpasswordlocalhost3000/api/dashboards/uid/<UID> | jq '.dashboard' > dashboardjson# Importer un dashboardcurl -X POST http//adminpasswordlocalhost3000/api/dashboards/import-H "Content-Type: application/json"-d '{"dashboard": <JSON>, "overwrite": true, "folderId": 0}'# Vérifier Lokicurl -s http//localhost3100/readycurl -s "http://localhost:3100/loki/api/v1/query?query={job=\"auth\"}&limit=5" | jq# Vérifier Promtailcurl -s http//localhost9080/metrics | grep promtail_sent_entries
Dashboards recommandés (import par ID)
| Dashboard | ID Grafana.com | Usage | |---|---|---| | Node Exporter Full | 1860 | Métriques système Linux | | cAdvisor | 14282 | Métriques containers Docker | | Docker | 893 | Vue globale Docker | | Loki Dashboard | 13639 | Logs agrégés | | Blackbox Exporter | 7587 | Sonde HTTP/TCP | | Alertmanager | 9578 | État des alertes |
# Importer depuis Grafana.com via APIcurl -X POST http//adminpasswordlocalhost3000/api/dashboards/import-H "Content-Type: application/json"-d '{"inputs":[{"name":"DS_PROMETHEUS","type":"datasource","pluginId":"prometheus","value":"Prometheus"}],"folderId":0,"overwrite":true,"path":"https://grafana.com/api/dashboards/1860/revisions/latest/download"}'
Utiliser le provisioning (fichiers YAML dans provisioning/) pour versionner datasources et dashboards dans Git. Évite la dérive de configuration et permet un redéploiement reproductible.
Changer le mot de passe admin par défaut et désactiver GF_USERS_ALLOW_SIGN_UP. En production, activer HTTPS via reverse proxy et configurer GF_SERVER_ROOT_URL avec le FQDN public.