Upgrades SLURP¶

Introduction¶

SLURP (Skip Level Upgrade Release Process) permet de sauter une version majeure d'OpenStack lors des mises à jour. Cette section couvre la planification et l'exécution des upgrades avec Kolla-Ansible.

Prérequis¶

OpenStack en production stable
Backups complets et vérifiés
Environnement de staging pour tests
Fenêtre de maintenance planifiée

Points à apprendre¶

Cycle de releases OpenStack¶

graph LR
    subgraph "2024"
        caracal["2024.1 Caracal<br/>(SLURP)"]
        dalmatian["2024.2 Dalmatian"]
    end

    subgraph "2025"
        epoxy["2025.1 Epoxy<br/>(SLURP)"]
        future["2025.2 (Future)"]
    end

    caracal --> dalmatian
    dalmatian --> epoxy
    caracal -.->|SLURP skip one| epoxy

    caracal:::slurp
    epoxy:::slurp
    dalmatian:::nonslurp
    future:::nonslurp

    classDef slurp fill:#90EE90
    classDef nonslurp fill:#FFFFE0

SLURP releases: - Support étendu - Upgrade skip-level - Recommandé production

Non-SLURP: - Features nouvelles - Upgrade séquentiel requis - Development/staging

Stratégie d'upgrade¶

flowchart TD
    Start([Start]) --> Plan[Planification<br/>- Review release notes<br/>- Identifier deprecations<br/>- Planifier fenêtre maintenance]

    Plan --> Prep[Préparation environnement]

    subgraph "Staging"
        Prep --> Clone[Clone configuration]
        Clone --> Deploy[Déployer version cible]
        Deploy --> TestFunc[Tests fonctionnels]
        TestFunc --> TestPerf[Tests performance]
    end

    TestPerf --> TestOK{Tests OK?}
    TestOK -->|Oui| PlanProd[Planifier production]
    TestOK -->|Non| Fix[Corriger problèmes]
    Fix --> Retest[Retester]
    Retest --> TestOK

    subgraph "Pre-upgrade Production"
        PlanProd --> Backup[Backup complet]
        Backup --> Snapshot[Snapshot VMs critiques]
        Snapshot --> Comm1[Communication utilisateurs]
        Comm1 --> Drain[Drain workloads optionnel]
    end

    subgraph "Upgrade Production"
        Drain --> UpCtrl[Upgrade control plane]
        UpCtrl --> VerifServ[Vérification services]
        VerifServ --> UpCompute[Upgrade compute nodes]
        UpCompute --> VerifVMs[Vérification VMs]
    end

    subgraph "Post-upgrade"
        VerifVMs --> TestValid[Tests validation]
        TestValid --> Monitor[Monitoring intensif]
        Monitor --> Comm2[Communication fin maintenance]
    end

    Comm2 --> Critical{Problèmes<br/>critiques?}
    Critical -->|Oui| Rollback[Rollback]
    Rollback --> PostMortem[Post-mortem]
    Critical -->|Non| Doc[Documenter upgrade]
    Doc --> Archive[Archiver backups]

    PostMortem --> End([Stop])
    Archive --> End

Préparation upgrade Kolla-Ansible¶

#!/bin/bash
# prepare-upgrade.sh

SOURCE_VERSION="2024.1"  # Caracal
TARGET_VERSION="2025.1"  # Epoxy

echo "=== Préparation upgrade $SOURCE_VERSION → $TARGET_VERSION ==="

# 1. Backup configuration actuelle
BACKUP_DIR="/backup/upgrade-$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR

cp -r /etc/kolla $BACKUP_DIR/
cp /etc/ansible/inventory $BACKUP_DIR/

# 2. Mettre à jour kolla-ansible
pip install --upgrade "kolla-ansible==${TARGET_VERSION}.*"

# 3. Copier nouvelle configuration
cp -r /usr/share/kolla-ansible/etc_examples/kolla/* /etc/kolla/
# Merger avec l'ancienne config (manuel)

# 4. Générer les différences de config
kolla-ansible -i /etc/kolla/inventory genconfig --check

# 5. Télécharger nouvelles images
kolla-ansible -i /etc/kolla/inventory pull

echo "Préparation terminée. Vérifier les configurations avant upgrade."

Configuration upgrade¶

# /etc/kolla/globals.yml

# Version cible
openstack_release: "2025.1"
kolla_base_distro: "ubuntu"
kolla_install_type: "source"

# Options d'upgrade
kolla_upgrade_allow_stop_services: "yes"
kolla_upgrade_skip_deprecation_warnings: "no"

# Rolling upgrade (minimize downtime)
kolla_enable_rolling_upgrade: "yes"
kolla_serial: "1"  # Upgrade un node à la fois

# Database migrations
database_upgrade_max_retries: 5
database_upgrade_retry_delay: 30

Script upgrade complet¶

#!/bin/bash
# upgrade-openstack.sh

set -e
LOG_FILE="/var/log/openstack-upgrade-$(date +%Y%m%d).log"
exec > >(tee -a $LOG_FILE) 2>&1

SOURCE_VERSION="2024.1"
TARGET_VERSION="2025.1"
INVENTORY="/etc/kolla/inventory"

echo "=========================================="
echo "OpenStack Upgrade: $SOURCE_VERSION → $TARGET_VERSION"
echo "Started: $(date)"
echo "=========================================="

# Fonction de vérification
check_services() {
    echo "Checking OpenStack services..."
    openstack service list
    openstack compute service list
    openstack network agent list
    openstack volume service list
}

# === Phase 1: Pre-checks ===
echo -e "\n=== Phase 1: Pre-checks ==="

# Vérifier état actuel
check_services

# Vérifier espace disque
df -h /var/lib/docker
if [ $(df /var/lib/docker | tail -1 | awk '{print $5}' | tr -d '%') -gt 80 ]; then
    echo "ERROR: Docker storage > 80% used"
    exit 1
fi

# Vérifier backups
LATEST_BACKUP=$(ls -t /backup/mariadb/*.tar.gz 2>/dev/null | head -1)
if [ -z "$LATEST_BACKUP" ]; then
    echo "ERROR: No recent backup found"
    exit 1
fi
echo "Latest backup: $LATEST_BACKUP"

# === Phase 2: Pull new images ===
echo -e "\n=== Phase 2: Pull new images ==="
kolla-ansible -i $INVENTORY pull

# Vérifier images
docker images | grep ${TARGET_VERSION} | head -10

# === Phase 3: Pre-upgrade checks ===
echo -e "\n=== Phase 3: Pre-upgrade checks ==="
kolla-ansible -i $INVENTORY prechecks

# === Phase 4: Upgrade control plane ===
echo -e "\n=== Phase 4: Upgrade control plane ==="

# Upgrade database schema first
echo "Upgrading database schemas..."
kolla-ansible -i $INVENTORY upgrade --tags mariadb,keystone,glance,cinder,neutron,nova,heat,horizon

# === Phase 5: Verify control plane ===
echo -e "\n=== Phase 5: Verify control plane ==="
sleep 30  # Wait for services to stabilize

# Test Keystone
openstack token issue

# Test Nova
openstack server list --all-projects --limit 5

# Test Neutron
openstack network list --limit 5

# === Phase 6: Upgrade compute nodes ===
echo -e "\n=== Phase 6: Upgrade compute nodes ==="

# Rolling upgrade des computes
for compute in $(grep -E '^\[compute\]' -A100 $INVENTORY | grep -v '^\[' | grep -v '^$' | head -10); do
    echo "Upgrading compute: $compute"

    # Migrate VMs away (optional)
    # openstack compute service set --disable $compute nova-compute
    # nova host-evacuate-live $compute

    # Upgrade compute
    kolla-ansible -i $INVENTORY upgrade --limit $compute --tags nova-compute

    # Re-enable
    # openstack compute service set --enable $compute nova-compute

    # Verify
    sleep 10
    openstack compute service list | grep $compute
done

# === Phase 7: Post-upgrade validation ===
echo -e "\n=== Phase 7: Post-upgrade validation ==="
check_services

# Test création VM
echo "Testing VM creation..."
openstack server create --flavor m1.tiny --image cirros --network internal \
    test-upgrade-vm --wait

openstack server show test-upgrade-vm
openstack server delete test-upgrade-vm

# === Phase 8: Cleanup ===
echo -e "\n=== Phase 8: Cleanup ==="

# Remove old images
docker image prune -f

# Cleanup old config backups (keep last 5)
ls -t /backup/kolla_config_*.tar.gz | tail -n +6 | xargs rm -f

echo -e "\n=========================================="
echo "Upgrade Completed: $(date)"
echo "=========================================="

Rollback procedure¶

#!/bin/bash
# rollback-upgrade.sh

set -e

BACKUP_DIR=$1
if [ -z "$BACKUP_DIR" ]; then
    echo "Usage: $0 <backup_directory>"
    echo "Available backups:"
    ls -la /backup/upgrade-*/
    exit 1
fi

echo "=== Rolling back to backup: $BACKUP_DIR ==="

# 1. Arrêter les services
echo "Stopping all services..."
kolla-ansible -i /etc/kolla/inventory stop --yes-i-really-really-mean-it

# 2. Restaurer configuration
echo "Restoring configuration..."
rm -rf /etc/kolla
cp -r $BACKUP_DIR/kolla /etc/

# 3. Restaurer database
echo "Restoring database..."
LATEST_DB=$(ls -t $BACKUP_DIR/mariadb*.tar.gz | head -1)
/opt/scripts/restore-mariadb.sh $LATEST_DB

# 4. Downgrade images
echo "Pulling old images..."
kolla-ansible -i /etc/kolla/inventory pull

# 5. Redéployer
echo "Redeploying services..."
kolla-ansible -i /etc/kolla/inventory deploy

# 6. Vérifier
echo "Verifying services..."
openstack service list
openstack compute service list

echo "Rollback completed"

Diagramme séquence upgrade¶

sequenceDiagram
    participant Admin
    participant Kolla-Ansible
    participant Control Plane
    participant Database
    participant Compute Nodes
    participant VMs

    Admin->>Kolla-Ansible: pull new images
    Kolla-Ansible->>Control Plane: Download images
    Kolla-Ansible->>Compute Nodes: Download images

    Admin->>Kolla-Ansible: prechecks
    Kolla-Ansible->>Control Plane: Verify requirements
    Kolla-Ansible->>Database: Check connectivity

    Admin->>Kolla-Ansible: upgrade control plane
    Kolla-Ansible->>Database: Stop MariaDB
    Kolla-Ansible->>Database: Upgrade schema
    Kolla-Ansible->>Database: Start MariaDB
    Kolla-Ansible->>Control Plane: Stop Keystone
    Kolla-Ansible->>Control Plane: Deploy new Keystone
    Kolla-Ansible->>Control Plane: Start Keystone

    loop Each service
        Kolla-Ansible->>Control Plane: Stop service
        Kolla-Ansible->>Database: Run migrations
        Kolla-Ansible->>Control Plane: Deploy new version
        Kolla-Ansible->>Control Plane: Start service
    end

    Admin->>Admin: Verify control plane

    loop Each compute node
        Admin->>Kolla-Ansible: upgrade compute
        Kolla-Ansible->>Compute Nodes: Stop nova-compute
        Note over VMs: VMs continue running
        Kolla-Ansible->>Compute Nodes: Deploy new nova-compute
        Kolla-Ansible->>Compute Nodes: Start nova-compute
        Admin->>Admin: Verify compute
    end

    Admin->>Admin: Full validation
    Admin->>Admin: Cleanup old images

Upgrade Ceph (séparé)¶

#!/bin/bash
# upgrade-ceph.sh

# Les upgrades Ceph se font séparément d'OpenStack

CURRENT="reef"
TARGET="squid"

echo "=== Ceph Upgrade: $CURRENT → $TARGET ==="

# 1. Vérifier santé cluster
ceph health
ceph osd tree

# 2. Définir les flags
ceph osd set noout
ceph osd set norebalance

# 3. Upgrade MONs (un par un)
for mon in mon1 mon2 mon3; do
    echo "Upgrading MON: $mon"
    cephadm shell --mon-id $mon -- ceph orch upgrade start --ceph-version $TARGET
    sleep 60
    ceph -s
done

# 4. Upgrade MGRs
for mgr in mgr1 mgr2; do
    echo "Upgrading MGR: $mgr"
    # Automatic via orchestrator
done

# 5. Upgrade OSDs (automatique rolling)
ceph orch upgrade start --ceph-version $TARGET

# Surveiller
watch ceph orch upgrade status

# 6. Retirer les flags
ceph osd unset noout
ceph osd unset norebalance

# 7. Vérifier
ceph health
ceph versions

Exemples pratiques¶

Checklist upgrade¶

## Pre-Upgrade Checklist

### Planning (1 semaine avant)
- [ ] Review release notes OpenStack ${TARGET_VERSION}
- [ ] Identifier breaking changes
- [ ] Tester en staging
- [ ] Planifier fenêtre maintenance
- [ ] Communiquer aux utilisateurs

### Jour J-1
- [ ] Full backup MariaDB
- [ ] Backup configurations
- [ ] Snapshot VMs critiques
- [ ] Vérifier espace disque
- [ ] Pull nouvelles images

### Jour J
- [ ] Début fenêtre maintenance
- [ ] Vérifier état cluster
- [ ] Exécuter upgrade control plane
- [ ] Vérifier services
- [ ] Upgrade compute nodes
- [ ] Tests validation

### Post-Upgrade
- [ ] Monitoring intensif 24h
- [ ] Communication fin maintenance
- [ ] Documenter l'upgrade
- [ ] Post-mortem si incidents

Test validation post-upgrade¶

#!/bin/bash
# validate-upgrade.sh

ERRORS=0

echo "=== OpenStack Post-Upgrade Validation ==="

# 1. API Endpoints
echo -e "\n[1] Testing API endpoints..."
for service in identity compute network image volume; do
    if openstack endpoint list --service $service | grep -q "https"; then
        echo "✓ $service endpoint OK"
    else
        echo "✗ $service endpoint FAILED"
        ((ERRORS++))
    fi
done

# 2. Service status
echo -e "\n[2] Checking service status..."
if openstack compute service list | grep -q "down"; then
    echo "✗ Some compute services are down"
    openstack compute service list | grep down
    ((ERRORS++))
else
    echo "✓ All compute services up"
fi

# 3. Network agents
echo -e "\n[3] Checking network agents..."
if openstack network agent list | grep -q "XXX"; then
    echo "✗ Some network agents are down"
    ((ERRORS++))
else
    echo "✓ All network agents up"
fi

# 4. Create test resources
echo -e "\n[4] Testing resource creation..."
openstack network create test-upgrade-net --provider-network-type vxlan
openstack subnet create test-upgrade-subnet --network test-upgrade-net --subnet-range 192.168.99.0/24
openstack server create --flavor m1.tiny --image cirros --network test-upgrade-net test-upgrade-vm --wait

if openstack server show test-upgrade-vm | grep -q "ACTIVE"; then
    echo "✓ VM creation successful"
else
    echo "✗ VM creation failed"
    ((ERRORS++))
fi

# Cleanup
openstack server delete test-upgrade-vm --wait
openstack subnet delete test-upgrade-subnet
openstack network delete test-upgrade-net

# Summary
echo -e "\n=== Validation Summary ==="
if [ $ERRORS -eq 0 ]; then
    echo "All tests PASSED"
    exit 0
else
    echo "$ERRORS test(s) FAILED"
    exit 1
fi

Ressources¶

Checkpoint¶

Stratégie SLURP comprise
Environnement staging préparé
Scripts upgrade créés
Procédure rollback documentée
Checklist upgrade complète
Tests validation automatisés
Premier upgrade testé en staging
Documentation mise à jour