Coming soon on TSB
Introduction
This guide documents a complete OpenStack 2025.2 Flamingo deployment built for TSB. Every command, every error, and every fix is captured here because real learning happens when things break.
What we built:
- A 5-node virtual OpenStack cluster on a single AMD Ryzen 9 9950X workstation
- Kolla-Ansible as the deployment engine
- OpenStack 2025.2 Flamingo, the latest release at time of writing
- Full stack: Compute (Nova), Networking (Neutron/OVS), Block Storage (Cinder), Identity (Keystone), Image (Glance), Orchestration (Heat)
Time to expect: End-to-end including VM creation, OS preparation, Kolla-Ansible installation, configuration, and first instance takes roughly 4 hours in a lab environment. The actual Kolla-Ansible install and deploy steps alone take around 1.5 hours. Your mileage will vary depending on internet speed and how many errors you hit.
Philosophy: OpenStack on VMs is not recommended for production. Physical nodes give you real performance and proper isolation. However, for learning the internals, a virtualised lab is perfectly valid and that is exactly what this is.
Host Hardware
| Component | Spec |
|---|---|
| CPU | AMD Ryzen 9 9950X (16-core, 32 threads) |
| RAM | ~198GB |
| Nested Virtualisation | Enabled (KVM/AMD) |
Verify nested virt is enabled before starting:
cat /sys/module/kvm_amd/parameters/nested
# Must return: 1VM Architecture
| Node | vCPUs | RAM | Disk | Role |
|---|---|---|---|---|
| os-controller | 6 | 24GB | 100GB | Control plane: Keystone, Glance, Nova API, Neutron API, Cinder API, Horizon, Heat |
| os-compute01 | 6 | 16GB | 50GB | Nova compute |
| os-compute02 | 6 | 16GB | 50GB | Nova compute |
| os-storage | 4 | 8GB | 50GB boot + 3x40GB | Cinder block storage |
| os-network | 4 | 8GB | 30GB | Neutron, OVS |
IP addressing:
| Node | IP |
|---|---|
| os-controller | 192.168.1.160 |
| os-compute01 | 192.168.1.161 |
| os-compute02 | 192.168.1.162 |
| os-storage | 192.168.1.163 |
| os-network | 192.168.1.164 |
VIP (Virtual IP): 192.168.1.160 (pointed at controller for single-node control plane)
OS Preparation
All nodes run Ubuntu 24.04 LTS, cloned from a base image with cloud-init.
Network Interface Naming
Ubuntu 24.04 uses predictable network naming by default (enp6s19 etc). Force legacy ethX naming for consistency across all nodes. This is critical because Kolla-Ansible references interface names by string.
sudo nano /etc/default/grubChange:
GRUB_CMDLINE_LINUX=""To:
GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0"sudo update-grub && sudo rebootSecond Network Interface (eth1)
All nodes except os-storage need a second NIC (eth1) for Neutron's external interface. Do not assign an IP to eth1 -- Neutron/OVS takes it over completely as a raw bridge port.
Create a netplan config for eth1 on all applicable nodes:
sudo tee /etc/netplan/eth1.yaml << EOF
network:
version: 2
ethernets:
eth1:
dhcp4: false
dhcp6: false
optional: true
EOF
sudo chmod 600 /etc/netplan/eth1.yaml
sudo netplan applyNo output after netplan apply means success. If you see a permissions warning before running chmod, that is expected -- just run chmod before applying.
Whyoptional: true? Without it,systemd-networkd-wait-onlinewill wait 2 minutes on every boot for eth1 to come online. Since eth1 has no IP, systemd considers it "not ready" and times out.optional: truetells systemd this interface does not block boot.
Cloud-Init: Persistent /etc/hosts
Cloud-init overwrites /etc/hosts on every reboot by default. Fix this permanently by removing the updateetchosts module:
sudo sed -i '/- update_etc_hosts/d' /etc/cloud/cloud.cfgThen add your hosts entries:
sudo tee -a /etc/hosts << EOF
192.168.1.160 os-controller
192.168.1.161 os-compute01
192.168.1.162 os-compute02
192.168.1.163 os-storage
192.168.1.164 os-network
EOFWhy edit cloud.cfg directly? The drop-in directory/etc/cloud/cloud.cfg.d/withmanageetchosts: falseonly controls the template renderer. Theupdateetchostsmodule still runs regardless. You must remove the module from the run list itself.
Do this on all 5 nodes and reboot to verify it survives.
Baseline Verification
Run on all 5 nodes to confirm everything is correct before proceeding:
hostname && ip a | grep 192.168 && free -h | grep Mem && nproc && lsblkNode Connectivity Test
From the controller:
for node in os-controller os-compute01 os-compute02 os-storage os-network; do
echo -n "Testing $node: "
ping -c 1 $node | grep -q "1 received" && echo "OK" || echo "FAIL"
doneExpected output:
Testing os-controller: OK
Testing os-compute01: OK
Testing os-compute02: OK
Testing os-storage: OK
Testing os-network: OKAll 5 must show OK before proceeding.
LVM Volume Group for Cinder
On os-storage only, create the LVM volume group that Kolla-Ansible expects:
sudo pvcreate /dev/sdb /dev/sdc /dev/sdd
sudo vgcreate cinder-volumes /dev/sdb /dev/sdc /dev/sdd
sudo vgsExpected output from vgs:
VG #PV #LV #SN Attr VSize VFree
cinder-volumes 3 0 0 wz--n- <119.99g <119.99gKolla-Ansible's precheck will fail withVolume group "cinder-volumes" not foundif this is missing. The namecinder-volumesis hardcoded and must match exactly.
Kolla-Ansible Installation
All Kolla-Ansible work is done from os-controller only.
Estimated time: 10-15 minutes
Install Prerequisites
sudo apt install -y python3-dev libffi-dev gcc libssl-dev python3-pip python3-venv gitCreate Virtual Environment
sudo mkdir -p /opt/kolla-venv
sudo chown $USER:$USER /opt/kolla-venv
python3 -m venv /opt/kolla-venv
source /opt/kolla-venv/bin/activateOptionally, add to .bashrc so the venv auto-activates on login:
echo "source /opt/kolla-venv/bin/activate" >> ~/.bashrcYou should see (kolla-venv) prepended to your shell prompt:
(kolla-venv) ubuntu@os-controller:~$Install Kolla-Ansible
pip install -U pip
pip install kolla-ansibleExpected output (final lines):
Successfully installed ansible-core-2.19.6 kolla-ansible-21.0.0 ...Kolla-Ansible 21.0.0 (Flamingo) will pull inansible-core 2.19.6and manage its own dependencies.
Install Ansible Galaxy Dependencies
kolla-ansible install-depsExpected output:
Starting galaxy collection install process
Process install dependency map
...
kolla.kolla (21.0.0) was installed successfullySetup Configuration Directory
sudo mkdir -p /etc/kolla
sudo chown $USER:$USER /etc/kolla
cp -r /opt/kolla-venv/share/kolla-ansible/etc_examples/kolla/* /etc/kolla/
cp /opt/kolla-venv/share/kolla-ansible/ansible/inventory/multinode /etc/kolla/multinodeConfiguration
Generate Passwords
kolla-genpwdNo output means success. This populates /etc/kolla/passwords.yml with secure random passwords for every OpenStack service. Never edit this file by hand.
Configure Inventory
Edit /etc/kolla/multinode and replace the top section:
[control]
os-controller
[network]
os-network
[compute]
os-compute01
os-compute02
[monitoring]
os-controller
[storage]
os-storage
[deployment]
localhost ansible_connection=localLeave everything below [common:children] untouched -- those sections inherit from the groups above.
Configure globals.yml
Edit /etc/kolla/globals.yml (use Ctrl+W to search in nano):
kolla_base_distro: "ubuntu"
openstack_release: "2025.2"
kolla_internal_vip_address: "192.168.1.160"
network_interface: "eth0"
neutron_external_interface: "eth1"
enable_haproxy: "no"
enable_cinder: "yes"
enable_cinder_backend_lvm: "yes"Why VIP = controller IP? With a single controller, HAProxy has nothing to balance. Pointing the VIP at the controller's actual IP avoids MariaDB connection failures that occur when Kolla tries to reach a VIP that nothing is listening on.
Verify active settings (strips commented lines):
grep -E 'kolla_base_distro|openstack_release|vip_address|network_interface|neutron_external|haproxy|cinder' /etc/kolla/globals.yml | grep -v '^#'Expected output:
kolla_base_distro: "ubuntu"
openstack_release: "2025.2"
kolla_internal_vip_address: "192.168.1.160"
network_interface: "eth0"
neutron_external_interface: "eth1"
enable_haproxy: "no"
enable_cinder: "yes"
enable_cinder_backend_lvm: "yes"SSH Key Setup
Kolla-Ansible SSHes from the controller to all other nodes. The controller also needs to SSH to itself by hostname.
Generate a key on the controller:
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -N ""Add the controller's public key to its own authorized_keys:
cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keysDistribute to all other nodes from your laptop:
CONTROLLER_KEY=$(ssh ubuntu@192.168.1.160 cat ~/.ssh/id_ed25519.pub)
for ip in 192.168.1.161 192.168.1.162 192.168.1.163 192.168.1.164; do
ssh ubuntu@$ip "echo '$CONTROLLER_KEY' >> ~/.ssh/authorized_keys"
doneVerify from the controller:
for node in os-controller os-compute01 os-compute02 os-storage os-network; do
echo -n "Testing $node: "
ssh -o ConnectTimeout=5 ubuntu@$node hostname
doneExpected output:
Testing os-controller: os-controller
Testing os-compute01: os-compute01
Testing os-compute02: os-compute02
Testing os-storage: os-storage
Testing os-network: os-networkDeployment
Bootstrap
Installs Docker and all prerequisites on all 5 nodes.
Estimated time: 5-10 minutes
kolla-ansible bootstrap-servers -i /etc/kolla/multinodeExpected final output:
PLAY RECAP *******************************
localhost : ok=1 changed=0 failed=0
os-compute01 : ok=44 changed=16 failed=0
os-compute02 : ok=44 changed=16 failed=0
os-controller: ok=44 changed=16 failed=0
os-network : ok=44 changed=16 failed=0
os-storage : ok=44 changed=16 failed=0All nodes must show failed=0 before proceeding.
Common bootstrap errors:
| Error | Cause | Fix |
|---|---|---|
Host key verification failed on os-controller | Controller has not accepted its own SSH host key | ssh ubuntu@os-controller to accept the key, then add public key to authorized_keys |
| Interface not found on os-controller | Cascade failure from SSH error above | Fix the SSH error first |
Pre-flight Checks
Estimated time: 2-3 minutes
kolla-ansible prechecks -i /etc/kolla/multinodeExpected final output:
PLAY RECAP *******************************
os-controller: ok=87 changed=0 failed=0
os-compute01 : ok=51 changed=0 failed=0
os-compute02 : ok=51 changed=0 failed=0
os-network : ok=51 changed=0 failed=0
os-storage : ok=32 changed=0 failed=0Common precheck errors:
| Error | Cause | Fix |
|---|---|---|
Volume group "cinder-volumes" not found on os-storage | LVM VG not created | pvcreate /dev/sdb /dev/sdc /dev/sdd && vgcreate cinder-volumes /dev/sdb /dev/sdc /dev/sdd |
To debug a single failing node:
kolla-ansible prechecks -i /etc/kolla/multinode --limit os-storage -vv 2>&1 | grep -A 10 "FAILED\|fatal"Deploy
This is the main event. Kolla-Ansible pulls all Docker images and deploys OpenStack across the cluster.
Estimated time: 30-40 minutes (varies significantly with internet speed for image pulls)
kolla-ansible deploy -i /etc/kolla/multinodeWatch progress on the controller in a separate terminal while deploy runs:
watch -n 2 'sudo docker ps --format "table {{.Names}}\t{{.Status}}"'You will see containers appearing and starting in dependency order. After a successful deploy you should see approximately 30+ containers running on the controller alone.
Expected final output from deploy:
PLAY RECAP *******************************
os-controller: ok=293 changed=164 failed=0
os-compute01 : ok=95 changed=45 failed=0
os-compute02 : ok=83 changed=44 failed=0
os-network : ok=94 changed=39 failed=0
os-storage : ok=46 changed=15 failed=0Total tasks across all nodes: approximately 611. All must show failed=0.
Common deploy errors:
| Error | Cause | Fix |
|---|---|---|
Can't connect to MySQL server on '192.168.1.170' (No route to host) | VIP not reachable with HAProxy disabled | Change kollainternalvip_address to controller's actual IP |
| eth1 interface issues during networking tasks | eth1 not configured in netplan | Create eth1.yaml and netplan apply on affected nodes |
Kolla-Ansible is idempotent. If deploy fails, fix the issue and re-run kolla-ansible deploy. It will resume without destroying existing containers.
Post-Deploy
Estimated time: 2-3 minutes
kolla-ansible post-deploy -i /etc/kolla/multinodeSource the admin credentials:
source /etc/kolla/admin-openrc.shInstall the OpenStack CLI client inside the venv (not with snap or apt):
pip install python-openstackclientWhy not snap? Snap packages run in isolation and conflict with the virtual environment. The snap version of the OpenStack client is also typically several releases behind.
Verify the deployment:
openstack compute service listExpected output:
+------+----------------+--------------+----------+-------+
| ID | Binary | Host | Zone | State |
+------+----------------+--------------+----------+-------+
| ... | nova-scheduler | os-controller| internal | up |
| ... | nova-conductor | os-controller| internal | up |
| ... | nova-compute | os-compute01 | nova | up |
| ... | nova-compute | os-compute02 | nova | up |
+------+----------------+--------------+----------+-------+openstack network agent listAll agents should show :-) in the Alive column.
Network Setup
External Network
openstack network create \
--provider-network-type flat \
--provider-physical-network physnet1 \
--external \
--share \
ext-net
openstack subnet create \
--network ext-net \
--subnet-range 192.168.1.0/24 \
--gateway 192.168.1.1 \
--dns-nameserver 8.8.8.8 \
--dns-nameserver 1.1.1.1 \
--allocation-pool start=192.168.1.210,end=192.168.1.220 \
--no-dhcp \
ext-subnetFloating IP pool: Carve out a range that your DHCP server does not use. In this lab pfSense serves192.168.1.30-200so we safely use210-220for floating IPs.--no-dhcp: Critical -- OpenStack must not run DHCP on the external network. Your existing DHCP server owns that responsibility.
Internal Network
openstack network create int-net
openstack subnet create \
--network int-net \
--subnet-range 10.0.0.0/24 \
--gateway 10.0.0.1 \
--dns-nameserver 8.8.8.8 \
--dns-nameserver 1.1.1.1 \
int-subnetNote the MTU difference: ext-net uses1500(flat/physical), int-net uses1450(VXLAN encapsulation adds 50 bytes of overhead).
Virtual Router
openstack router create router1
openstack router set router1 --external-gateway ext-net
openstack router add subnet router1 int-subnetVerify router wiring:
openstack router show router1externalgatewayinfo should show ext-net with an IP from your floating pool. interfaces_info should show 10.0.0.1.
Horizon showing "Something went wrong" after creating networks? This is a known issue caused by a stale memcached session. Horizon cached your login token before the networks existed and now can't resolve them. Fix it with:
```bash
sudo docker restart memcached horizon
```
Then log out of Horizon completely, clear your browser session, and log back in fresh.
Hello World
Prerequisites
Flavor:
openstack flavor create \
--vcpus 1 \
--ram 512 \
--disk 10 \
m1.tinyUpload Ubuntu 24.04 cloud image:
wget https://cloud-images.ubuntu.com/minimal/releases/noble/release/ubuntu-24.04-minimal-cloudimg-amd64.img
openstack image create \
--container-format bare \
--disk-format qcow2 \
--file ubuntu-24.04-minimal-cloudimg-amd64.img \
--public \
ubuntu-24.04Upload CirrOS (lightweight test image, 12MB, boots in seconds):
wget http://download.cirros-cloud.net/0.6.2/cirros-0.6.2-x86_64-disk.img
openstack image create \
--container-format bare \
--disk-format qcow2 \
--file cirros-0.6.2-x86_64-disk.img \
--public \
cirros-0.6.2CirrOS default credentials: usercirros, passwordgocubsgo(changed in newer images). CirrOS does not use keypair auth by default -- use the Horizon console (Compute > Instances > Console) or SSH with password authentication.
SSH Keypair:
openstack keypair create mykey > ~/.ssh/mykey.pem
chmod 600 ~/.ssh/mykey.pemSecurity note: A single keypair deployed to all instances is convenient for a lab but is not suitable for production. If the private key is compromised, every instance is exposed. For production environments a Zero Trust architecture should be applied -- short-lived certificates issued by a CA (such as HashiCorp Vault SSH CA), per-role keypairs, a hardened bastion host as the single SSH entry point, and just-in-time access with full audit logging. No static keys, no permanent access.
Security Group:
openstack security group create default-web-access
openstack security group rule create --protocol icmp default-web-access
openstack security group rule create \
--protocol tcp \
--dst-port 22 \
default-web-access
openstack security group rule create \
--protocol tcp \
--dst-port 80 \
default-web-access
openstack security group rule create \
--protocol tcp \
--dst-port 443 \
default-web-accessLaunch Instance
openstack server create \
--flavor m1.tiny \
--image ubuntu-24.04 \
--network int-net \
--security-group default-web-access \
--key-name mykey \
hello-worldWatch status:
watch -n 2 'openstack server show hello-world | grep -E "status|task_state|addresses|host"'You will see the instance progress through BUILD / spawning then BUILD / networking and finally reach ACTIVE. The Ubuntu 24.04 cloud image typically takes 30-60 seconds.
Expected final state:
| OS-EXT-SRV-ATTR:host | os-compute01 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | os-compute01 |
| addresses | int-net=10.0.0.6 |
| status | ACTIVE |Assign Floating IP
openstack floating ip create ext-net
openstack server add floating ip hello-world <floating-ip>Verify:
openstack server show hello-world | grep addresses
# int-net=10.0.0.6, 192.168.1.214SSH Access
From the controller:
ssh -i ~/.ssh/mykey.pem ubuntu@<floating-ip>From your laptop (after copying the key):
scp ubuntu@192.168.1.160:~/.ssh/mykey.pem ~/.ssh/mykey.pem
chmod 600 ~/.ssh/mykey.pem
ssh -i ~/.ssh/mykey.pem ubuntu@<floating-ip>Optionally, add your own key for passwordless access going forward (replace with whatever .pub file you use):
ssh-copy-id -i ~/.ssh/id_ed25519.pub -o "IdentityFile ~/.ssh/mykey.pem" ubuntu@<floating-ip>Useful Commands
List all instances:
openstack server listCheck service health:
openstack compute service list
openstack network agent list
openstack volume service listWatch containers on a node:
watch -n 2 'sudo docker ps --format "table {{.Names}}\t{{.Status}}"'Follow a container log:
sudo docker logs -f <container-name>Connect to MariaDB:
DB_PASS=$(sudo grep '^database_password:' /etc/kolla/passwords.yml | awk '{print $2}')
sudo docker exec -it mariadb mariadb -u root -p"$DB_PASS"Redeploy after config change:
kolla-ansible deploy -i /etc/kolla/multinodeRun prechecks on single node:
kolla-ansible prechecks -i /etc/kolla/multinode --limit os-storage -vvLessons Learned
| # | Issue | Root Cause | Fix |
|---|---|---|---|
| 1 | /etc/hosts wiped on reboot | Cloud-init updateetchosts module runs on every boot | Remove module from cloud.cfg run list |
| 2 | MariaDB No route to host on VIP | kollainternalvip_address pointed at unused VIP with HAProxy disabled | Set VIP to controller's actual IP |
| 3 | Precheck fails on os-storage | cinder-volumes LVM volume group not created | pvcreate + vgcreate cinder-volumes on storage node |
| 4 | Bootstrap SSH fails on os-controller | Controller had not accepted its own host key | SSH to self once + add public key to authorized_keys |
| 5 | eth1 interfaces DOWN | No netplan config for eth1 | Create eth1.yaml with optional: true |
| 6 | 2 minute boot delay | systemd-networkd-wait-online waiting for eth1 | Add optional: true to eth1 netplan config |
| 7 | Horizon 500 error after network creation | Stale memcached session from before networks existed | Restart memcached + clear browser session |
| 8 | Instance fails on compute02 | KVM/nested virt not enabled on compute02 VM | Enable CPU passthrough in hypervisor settings for compute02 |
| 9 | Floating IP unreachable | eth1 interfaces DOWN, OVS bridge not connected to physical network | Bring eth1 UP via netplan |
| 10 | Netplan permissions warning | File permissions too open | chmod 600 /etc/netplan/eth1.yaml |
Architecture Overview
Your Laptop
SSH / HTTP
192.168.1.210-220 (Floating IPs on ext-net)
NAT/SNAT through virtual router
10.0.0.0/24 (int-net, instance private network)
VXLAN tunnel between compute nodes
os-compute01 / os-compute02 (KVM/libvirt via nova_libvirt)
iSCSI (for Cinder volumes)
os-storage (Cinder LVM, cinder-volumes VG)
All API calls
os-controller (Keystone, Nova, Neutron, Glance, Cinder, Heat APIs)
All network traffic
os-network (Neutron L3, DHCP, Metadata, OVS)What's Next
- Breaking session -- kill a compute node, observe, recover
- Zantu Cloud deployment -- real SaaS workload on the cluster
- Juju / Charmed OpenStack -- Day 2 operations deep dive
- Masakari -- automatic instance HA
- HashiCorp Vault SSH CA -- Zero Trust access
Built for TSB
OpenStack 2025.2 Flamingo | Kolla-Ansible 21.0.0
February 2026