Skip to content

Capacity Planning

Introduction

Le capacity planning utilise les métriques historiques pour anticiper les besoins en ressources. Cela permet d'éviter les saturations et de planifier les extensions d'infrastructure.

Prérequis

Points à apprendre

Métriques clés pour capacity planning

graph TB
    subgraph Compute["Compute Resources"]
        vcpu[vCPU<br/>Used vs Total<br/>Allocation ratio]
        ram[Memory<br/>Used vs Available<br/>Overcommit]
        instances[Instances<br/>Count, growth rate]
    end

    subgraph Storage["Storage Resources"]
        ceph_cap[Ceph Capacity<br/>Used vs Total<br/>Growth rate]
        volumes[Volumes<br/>Count, total size]
        images[Images<br/>Size, count]
    end

    subgraph Network["Network Resources"]
        ips[Floating IPs<br/>Used vs quota]
        ports[Ports<br/>Count by network]
        bandwidth[Bandwidth<br/>Peak usage]
    end

    subgraph Infrastructure
        nodes[Hypervisor Nodes<br/>Load, capacity]
        db_size[Database Size<br/>Growth rate]
        mq_queues[Message Queues<br/>Depth, rate]
    end

Dashboard Capacity Planning

{
  "title": "Capacity Planning",
  "panels": [
    {
      "title": "vCPU Utilization Trend",
      "type": "timeseries",
      "targets": [
        {
          "expr": "sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100",
          "legendFormat": "Current"
        },
        {
          "expr": "predict_linear(sum(openstack_nova_vcpus_used)[30d:1d], 86400*30)",
          "legendFormat": "Predicted (30d)"
        }
      ],
      "thresholds": [
        {"value": 70, "color": "yellow"},
        {"value": 85, "color": "red"}
      ]
    },
    {
      "title": "Memory Utilization Trend",
      "type": "timeseries",
      "targets": [
        {
          "expr": "sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes) * 100",
          "legendFormat": "Current"
        },
        {
          "expr": "predict_linear(sum(openstack_nova_memory_used_bytes)[30d:1d], 86400*30) / sum(openstack_nova_memory_total_bytes) * 100",
          "legendFormat": "Predicted (30d)"
        }
      ]
    },
    {
      "title": "Storage Growth",
      "type": "timeseries",
      "targets": [
        {
          "expr": "ceph_cluster_total_used_bytes / 1099511627776",
          "legendFormat": "Current (TB)"
        },
        {
          "expr": "predict_linear(ceph_cluster_total_used_bytes[30d:1d], 86400*90) / 1099511627776",
          "legendFormat": "Predicted 90d (TB)"
        }
      ]
    },
    {
      "title": "Days Until Full (Storage)",
      "type": "stat",
      "targets": [
        {
          "expr": "(ceph_cluster_total_bytes - ceph_cluster_total_used_bytes) / deriv(ceph_cluster_total_used_bytes[30d]) / 86400"
        }
      ],
      "thresholds": [
        {"value": 30, "color": "red"},
        {"value": 90, "color": "yellow"},
        {"value": null, "color": "green"}
      ]
    }
  ]
}

Requêtes PromQL pour capacity

# === Compute ===

# vCPU usage rate
sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100

# Memory usage rate
sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes) * 100

# Instance growth rate (per day)
deriv(openstack_nova_running_vms[7d]) * 86400

# Predicted vCPU usage in 30 days
predict_linear(sum(openstack_nova_vcpus_used)[30d:1d], 86400*30)

# === Storage ===

# Current storage usage %
ceph_cluster_total_used_bytes / ceph_cluster_total_bytes * 100

# Storage growth rate (GB/day)
deriv(ceph_cluster_total_used_bytes[7d]) / 1073741824 * 86400

# Days until 85% full
(ceph_cluster_total_bytes * 0.85 - ceph_cluster_total_used_bytes) /
  deriv(ceph_cluster_total_used_bytes[30d]) / 86400

# Volume count growth
deriv(openstack_cinder_volumes[7d]) * 86400

# === Network ===

# Floating IP usage
sum(openstack_neutron_floatingips_used) / sum(openstack_neutron_floatingips_total) * 100

# Port growth rate
deriv(openstack_neutron_ports[7d]) * 86400

# === Infrastructure ===

# Hypervisor load average
avg(node_load5) by (instance)

# Database size growth (MB/day)
deriv(mysql_global_status_data_length_bytes[7d]) / 1048576 * 86400

Seuils de planification

Ressource Seuil Warning Seuil Critical Action
vCPU 70% 85% Ajouter compute nodes
RAM 70% 85% Ajouter compute nodes
Storage 70% 85% Ajouter OSDs
Floating IPs 80% 95% Étendre pool
DB Size 70% capacity 85% Archiver/Purger

Alertes capacity planning

# /etc/kolla/config/prometheus/rules/capacity-alerts.yml
groups:
  - name: capacity-planning
    rules:
      - alert: ComputeCapacityWarning
        expr: sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100 > 70
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "vCPU usage at {{ $value | humanize }}%"
          description: "Consider adding compute nodes"
          runbook_url: "https://wiki.example.com/capacity/compute"

      - alert: ComputeCapacityCritical
        expr: sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total) * 100 > 85
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: "vCPU usage critical at {{ $value | humanize }}%"
          description: "Immediate action required"

      - alert: StorageCapacityWarning
        expr: ceph_cluster_total_used_bytes / ceph_cluster_total_bytes * 100 > 70
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Storage usage at {{ $value | humanize }}%"

      - alert: StorageWillFillSoon
        expr: |
          (ceph_cluster_total_bytes - ceph_cluster_total_used_bytes) /
          deriv(ceph_cluster_total_used_bytes[30d]) / 86400 < 60
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Storage will be full in {{ $value | humanize }} days"
          description: "Based on 30-day growth trend"

      - alert: MemoryCapacityWarning
        expr: sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes) * 100 > 70
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Memory usage at {{ $value | humanize }}%"

Rapport de capacity

#!/bin/bash
# capacity-report.sh

PROMETHEUS="http://10.0.0.10:9090"

query() {
    curl -s "${PROMETHEUS}/api/v1/query?query=$1" | jq -r '.data.result[0].value[1]'
}

echo "=========================================="
echo "  CAPACITY REPORT - $(date '+%Y-%m-%d')"
echo "=========================================="

echo -e "\n=== COMPUTE ==="
VCPU_USED=$(query "sum(openstack_nova_vcpus_used)")
VCPU_TOTAL=$(query "sum(openstack_nova_vcpus_total)")
VCPU_PCT=$(echo "scale=1; $VCPU_USED / $VCPU_TOTAL * 100" | bc)
echo "vCPU: ${VCPU_USED}/${VCPU_TOTAL} (${VCPU_PCT}%)"

MEM_USED=$(query "sum(openstack_nova_memory_used_bytes) / 1073741824")
MEM_TOTAL=$(query "sum(openstack_nova_memory_total_bytes) / 1073741824")
MEM_PCT=$(echo "scale=1; $MEM_USED / $MEM_TOTAL * 100" | bc)
echo "Memory: ${MEM_USED}GB/${MEM_TOTAL}GB (${MEM_PCT}%)"

INSTANCES=$(query "openstack_nova_running_vms")
echo "Running Instances: ${INSTANCES}"

echo -e "\n=== STORAGE ==="
STORAGE_USED=$(query "ceph_cluster_total_used_bytes / 1099511627776")
STORAGE_TOTAL=$(query "ceph_cluster_total_bytes / 1099511627776")
STORAGE_PCT=$(echo "scale=1; $STORAGE_USED / $STORAGE_TOTAL * 100" | bc)
echo "Ceph: ${STORAGE_USED}TB/${STORAGE_TOTAL}TB (${STORAGE_PCT}%)"

VOLUMES=$(query "openstack_cinder_volumes")
echo "Volumes: ${VOLUMES}"

echo -e "\n=== PROJECTIONS (30 days) ==="
VCPU_PRED=$(query "predict_linear(sum(openstack_nova_vcpus_used)[30d:1d], 86400*30)")
echo "vCPU in 30d: ${VCPU_PRED}"

STORAGE_GROWTH=$(query "deriv(ceph_cluster_total_used_bytes[30d]) / 1073741824 * 86400")
echo "Storage growth: ${STORAGE_GROWTH} GB/day"

DAYS_UNTIL_FULL=$(query "(ceph_cluster_total_bytes - ceph_cluster_total_used_bytes) / deriv(ceph_cluster_total_used_bytes[30d]) / 86400")
echo "Days until storage 100%: ${DAYS_UNTIL_FULL}"

echo -e "\n=== RECOMMENDATIONS ==="
if (( $(echo "$VCPU_PCT > 70" | bc -l) )); then
    echo "⚠️  Consider adding compute nodes (vCPU > 70%)"
fi
if (( $(echo "$STORAGE_PCT > 70" | bc -l) )); then
    echo "⚠️  Consider adding storage (Ceph > 70%)"
fi
if (( $(echo "$DAYS_UNTIL_FULL < 90" | bc -l) )); then
    echo "⚠️  Storage will be full in less than 90 days"
fi

Diagramme de tendance

graph LR
    today["Aujourd'hui<br/>vCPU: 65%<br/>RAM: 60%<br/>Storage: 55%"]
    d30["30 jours<br/>vCPU: 72%<br/>RAM: 67%<br/>Storage: 62%"]
    d60["60 jours<br/>vCPU: 79%<br/>RAM: 74%<br/>Storage: 69%"]
    d90["90 jours<br/>vCPU: 86% ⚠️<br/>RAM: 81%<br/>Storage: 76%"]

    today -->|+7%| d30
    d30 -->|+7%| d60
    d60 -->|+7%| d90

    note1[Action requise:<br/>Ajouter 2 compute nodes<br/>avant jour 60]
    d90 -.-> note1

    style d90 fill:#ff9999
    style note1 fill:#ffff99

Exemples pratiques

Export rapport hebdomadaire

# Cron job pour rapport hebdomadaire
0 8 * * 1 /opt/scripts/capacity-report.sh | mail -s "Weekly Capacity Report" ops@example.com

Recording rules pour performance

# /etc/kolla/config/prometheus/rules/capacity-recording.yml
groups:
  - name: capacity-recording
    interval: 5m
    rules:
      - record: capacity:vcpu:usage_ratio
        expr: sum(openstack_nova_vcpus_used) / sum(openstack_nova_vcpus_total)

      - record: capacity:memory:usage_ratio
        expr: sum(openstack_nova_memory_used_bytes) / sum(openstack_nova_memory_total_bytes)

      - record: capacity:storage:usage_ratio
        expr: ceph_cluster_total_used_bytes / ceph_cluster_total_bytes

      - record: capacity:storage:growth_rate_bytes_per_day
        expr: deriv(ceph_cluster_total_used_bytes[7d]) * 86400

      - record: capacity:instances:growth_rate_per_day
        expr: deriv(openstack_nova_running_vms[7d]) * 86400

Ressources

Checkpoint

  • Métriques de capacité collectées
  • Dashboard capacity planning créé
  • Alertes de seuils configurées
  • Recording rules pour projections
  • Script de rapport automatisé
  • Seuils d'alerte documentés
  • Procédures d'expansion définies