Capacity Planning¶
Introduction¶
Le capacity planning utilise les métriques historiques pour anticiper les besoins en ressources. Cela permet d'éviter les saturations et de planifier les extensions d'infrastructure.
Prérequis¶
- Prometheus avec historique (30+ jours)
- Grafana configuré
- Compréhension des métriques OpenStack
Points à apprendre¶
Métriques clés pour capacity planning¶
graph TB
subgraph Compute["Compute Resources"]
vcpu[vCPU<br/>Used vs Total<br/>Allocation ratio]
ram[Memory<br/>Used vs Available<br/>Overcommit]
instances[Instances<br/>Count, growth rate]
end
subgraph Storage["Storage Resources"]
ceph_cap[Ceph Capacity<br/>Used vs Total<br/>Growth rate]
volumes[Volumes<br/>Count, total size]
images[Images<br/>Size, count]
end
subgraph Network["Network Resources"]
ips[Floating IPs<br/>Used vs quota]
ports[Ports<br/>Count by network]
bandwidth[Bandwidth<br/>Peak usage]
end
subgraph Infrastructure
nodes[Hypervisor Nodes<br/>Load, capacity]
db_size[Database Size<br/>Growth rate]
mq_queues[Message Queues<br/>Depth, rate]
end
Dashboard Capacity Planning¶
{
"title": "Capacity Planning",
"panels": [
{
"title": "vCPU Utilization Trend",
"type": "timeseries",
"targets": [
{
"expr": "sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100",
"legendFormat": "Current"
},
{
"expr": "predict_linear(sum(openstack_nova_vcpus_used)[30d:1d], 86400*30)",
"legendFormat": "Predicted (30d)"
}
],
"thresholds": [
{"value": 70, "color": "yellow"},
{"value": 85, "color": "red"}
]
},
{
"title": "Memory Utilization Trend",
"type": "timeseries",
"targets": [
{
"expr": "sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes) * 100",
"legendFormat": "Current"
},
{
"expr": "predict_linear(sum(openstack_nova_memory_used_bytes)[30d:1d], 86400*30) / sum(openstack_nova_memory_total_bytes) * 100",
"legendFormat": "Predicted (30d)"
}
]
},
{
"title": "Storage Growth",
"type": "timeseries",
"targets": [
{
"expr": "ceph_cluster_total_used_bytes / 1099511627776",
"legendFormat": "Current (TB)"
},
{
"expr": "predict_linear(ceph_cluster_total_used_bytes[30d:1d], 86400*90) / 1099511627776",
"legendFormat": "Predicted 90d (TB)"
}
]
},
{
"title": "Days Until Full (Storage)",
"type": "stat",
"targets": [
{
"expr": "(ceph_cluster_total_bytes - ceph_cluster_total_used_bytes) / deriv(ceph_cluster_total_used_bytes[30d]) / 86400"
}
],
"thresholds": [
{"value": 30, "color": "red"},
{"value": 90, "color": "yellow"},
{"value": null, "color": "green"}
]
}
]
}
Requêtes PromQL pour capacity¶
# === Compute ===
# vCPU usage rate
sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100
# Memory usage rate
sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes) * 100
# Instance growth rate (per day)
deriv(openstack_nova_running_vms[7d]) * 86400
# Predicted vCPU usage in 30 days
predict_linear(sum(openstack_nova_vcpus_used)[30d:1d], 86400*30)
# === Storage ===
# Current storage usage %
ceph_cluster_total_used_bytes / ceph_cluster_total_bytes * 100
# Storage growth rate (GB/day)
deriv(ceph_cluster_total_used_bytes[7d]) / 1073741824 * 86400
# Days until 85% full
(ceph_cluster_total_bytes * 0.85 - ceph_cluster_total_used_bytes) /
deriv(ceph_cluster_total_used_bytes[30d]) / 86400
# Volume count growth
deriv(openstack_cinder_volumes[7d]) * 86400
# === Network ===
# Floating IP usage
sum(openstack_neutron_floatingips_used) / sum(openstack_neutron_floatingips_total) * 100
# Port growth rate
deriv(openstack_neutron_ports[7d]) * 86400
# === Infrastructure ===
# Hypervisor load average
avg(node_load5) by (instance)
# Database size growth (MB/day)
deriv(mysql_global_status_data_length_bytes[7d]) / 1048576 * 86400
Seuils de planification¶
| Ressource | Seuil Warning | Seuil Critical | Action |
|---|---|---|---|
| vCPU | 70% | 85% | Ajouter compute nodes |
| RAM | 70% | 85% | Ajouter compute nodes |
| Storage | 70% | 85% | Ajouter OSDs |
| Floating IPs | 80% | 95% | Étendre pool |
| DB Size | 70% capacity | 85% | Archiver/Purger |
Alertes capacity planning¶
# /etc/kolla/config/prometheus/rules/capacity-alerts.yml
groups:
- name: capacity-planning
rules:
- alert: ComputeCapacityWarning
expr: sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100 > 70
for: 1h
labels:
severity: warning
annotations:
summary: "vCPU usage at {{ $value | humanize }}%"
description: "Consider adding compute nodes"
runbook_url: "https://wiki.example.com/capacity/compute"
- alert: ComputeCapacityCritical
expr: sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100 > 85
for: 30m
labels:
severity: critical
annotations:
summary: "vCPU usage critical at {{ $value | humanize }}%"
description: "Immediate action required"
- alert: StorageCapacityWarning
expr: ceph_cluster_total_used_bytes / ceph_cluster_total_bytes * 100 > 70
for: 1h
labels:
severity: warning
annotations:
summary: "Storage usage at {{ $value | humanize }}%"
- alert: StorageWillFillSoon
expr: |
(ceph_cluster_total_bytes - ceph_cluster_total_used_bytes) /
deriv(ceph_cluster_total_used_bytes[30d]) / 86400 < 60
for: 1h
labels:
severity: warning
annotations:
summary: "Storage will be full in {{ $value | humanize }} days"
description: "Based on 30-day growth trend"
- alert: MemoryCapacityWarning
expr: sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes) * 100 > 70
for: 1h
labels:
severity: warning
annotations:
summary: "Memory usage at {{ $value | humanize }}%"
Rapport de capacity¶
#!/bin/bash
# capacity-report.sh
PROMETHEUS="http://10.0.0.10:9090"
query() {
curl -s "${PROMETHEUS}/api/v1/query?query=$1" | jq -r '.data.result[0].value[1]'
}
echo "=========================================="
echo " CAPACITY REPORT - $(date '+%Y-%m-%d')"
echo "=========================================="
echo -e "\n=== COMPUTE ==="
VCPU_USED=$(query "sum(openstack_nova_vcpus_used)")
VCPU_TOTAL=$(query "sum(openstack_nova_vcpus_total)")
VCPU_PCT=$(echo "scale=1; $VCPU_USED / $VCPU_TOTAL * 100" | bc)
echo "vCPU: ${VCPU_USED}/${VCPU_TOTAL} (${VCPU_PCT}%)"
MEM_USED=$(query "sum(openstack_nova_memory_used_bytes) / 1073741824")
MEM_TOTAL=$(query "sum(openstack_nova_memory_total_bytes) / 1073741824")
MEM_PCT=$(echo "scale=1; $MEM_USED / $MEM_TOTAL * 100" | bc)
echo "Memory: ${MEM_USED}GB/${MEM_TOTAL}GB (${MEM_PCT}%)"
INSTANCES=$(query "openstack_nova_running_vms")
echo "Running Instances: ${INSTANCES}"
echo -e "\n=== STORAGE ==="
STORAGE_USED=$(query "ceph_cluster_total_used_bytes / 1099511627776")
STORAGE_TOTAL=$(query "ceph_cluster_total_bytes / 1099511627776")
STORAGE_PCT=$(echo "scale=1; $STORAGE_USED / $STORAGE_TOTAL * 100" | bc)
echo "Ceph: ${STORAGE_USED}TB/${STORAGE_TOTAL}TB (${STORAGE_PCT}%)"
VOLUMES=$(query "openstack_cinder_volumes")
echo "Volumes: ${VOLUMES}"
echo -e "\n=== PROJECTIONS (30 days) ==="
VCPU_PRED=$(query "predict_linear(sum(openstack_nova_vcpus_used)[30d:1d], 86400*30)")
echo "vCPU in 30d: ${VCPU_PRED}"
STORAGE_GROWTH=$(query "deriv(ceph_cluster_total_used_bytes[30d]) / 1073741824 * 86400")
echo "Storage growth: ${STORAGE_GROWTH} GB/day"
DAYS_UNTIL_FULL=$(query "(ceph_cluster_total_bytes - ceph_cluster_total_used_bytes) / deriv(ceph_cluster_total_used_bytes[30d]) / 86400")
echo "Days until storage 100%: ${DAYS_UNTIL_FULL}"
echo -e "\n=== RECOMMENDATIONS ==="
if (( $(echo "$VCPU_PCT > 70" | bc -l) )); then
echo "⚠️ Consider adding compute nodes (vCPU > 70%)"
fi
if (( $(echo "$STORAGE_PCT > 70" | bc -l) )); then
echo "⚠️ Consider adding storage (Ceph > 70%)"
fi
if (( $(echo "$DAYS_UNTIL_FULL < 90" | bc -l) )); then
echo "⚠️ Storage will be full in less than 90 days"
fi
Diagramme de tendance¶
graph LR
today["Aujourd'hui<br/>vCPU: 65%<br/>RAM: 60%<br/>Storage: 55%"]
d30["30 jours<br/>vCPU: 72%<br/>RAM: 67%<br/>Storage: 62%"]
d60["60 jours<br/>vCPU: 79%<br/>RAM: 74%<br/>Storage: 69%"]
d90["90 jours<br/>vCPU: 86% ⚠️<br/>RAM: 81%<br/>Storage: 76%"]
today -->|+7%| d30
d30 -->|+7%| d60
d60 -->|+7%| d90
note1[Action requise:<br/>Ajouter 2 compute nodes<br/>avant jour 60]
d90 -.-> note1
style d90 fill:#ff9999
style note1 fill:#ffff99
Exemples pratiques¶
Export rapport hebdomadaire¶
# Cron job pour rapport hebdomadaire
0 8 * * 1 /opt/scripts/capacity-report.sh | mail -s "Weekly Capacity Report" ops@example.com
Recording rules pour performance¶
# /etc/kolla/config/prometheus/rules/capacity-recording.yml
groups:
- name: capacity-recording
interval: 5m
rules:
- record: capacity:vcpu:usage_ratio
expr: sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total)
- record: capacity:memory:usage_ratio
expr: sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes)
- record: capacity:storage:usage_ratio
expr: ceph_cluster_total_used_bytes / ceph_cluster_total_bytes
- record: capacity:storage:growth_rate_bytes_per_day
expr: deriv(ceph_cluster_total_used_bytes[7d]) * 86400
- record: capacity:instances:growth_rate_per_day
expr: deriv(openstack_nova_running_vms[7d]) * 86400
Ressources¶
Checkpoint¶
- Métriques de capacité collectées
- Dashboard capacity planning créé
- Alertes de seuils configurées
- Recording rules pour projections
- Script de rapport automatisé
- Seuils d'alerte documentés
- Procédures d'expansion définies