Emond Papegaaij 16/06/2022 8 min read

Performing the OS upgrade

The OS upgrade from CentOS 7 to AlmaLinux OS 8 is an important step to execute after upgrading to Topicus KeyHub 20.1. This step has to be performed manually and may require some modification to the VM prior to starting.

This guide will explain the steps required to perform the OS upgrade. There is no need for you to perform the upgrade on your own, because we offer extensive assistance to our customers for this upgrade.

 

Prerequisites

Before the upgrade can be started, please make sure you've completed the following steps:

  • Check for updates either via the appliance manager when running online or on our website when running offline.
  • Upgrade the VM to the latest version of Topicus KeyHub, also available for CentOS 7. At the time of writing, this is 20.1.
  • Install any pending OS updates.
  • Reboot the VM if required.
  • Download the AlmaLinux 8 update bundle, called topicus-keyhub-20.1-update-alma8.tar.gz.gpg.
  • Create a full virtual machine snapshot in your hypervisor. This is very important! Topicus KeyHub will not be able to recover in case of an error during the upgrade.

 

Virtual Machine Configuration

Depending on your hypervisor platform and deployment configuration, some modifications might need to be made to the virtual machine before the upgrade can be started. Check these items carefully.

 

LSI Logic Disk Controller

When running on VMware vSphere, the disk controller is most likely set to LSI Logic Parallel. This controller is no longer supported by the AlmaLinux 8 kernel and  therefore needs to be replaced. Power down the virtual machine and open the settings of the VM in the vSphere client. Here you will find a SCSI controller 0. If it is indeed set to LSI Logic Parallel, open the options for that controller and change its type to VMware Paravirtual. Save the settings and power the VM back on.

SCSI Controller 0 still set to LSI Logic Parallel

NIC Configuration

Support for some NICs has been dropped in the newer kernel. This most likely will not cause any issues, because the default e1000 is still available. However, if you've modified this on your installation or added additional NICs, you may face issues. Check the network adapters in the settings of your hypervisor. On vSphere, make sure none have set their type to Flexible. If you need to update the adapter type, it is important you keep the MAC address the same, otherwise the appliance will lose its network configuration. With a snapshot present, it may not be possible to change the adapter type. In that case, add a new adapter with the same MAC and remove the old one.

If your virtual machine has multiple NICs, it is possible that one or more use kernel NIC names, such as eth0 and eth1. These names are no longer supported because the order of NICs cannot be guaranteed. The name of a NIC must be explicitly bound to its MAC address. To check the names of the NICs on your VM, run the following command:

[root@key-test-app ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth-tkh: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:50:56:97:5e:d1 brd ff:ff:ff:ff:ff:ff
inet 172.25.241.20/22 brd 172.25.243.255 scope global noprefixroute eth-tkh
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:50:56:97:1a:93 brd ff:ff:ff:ff:ff:ff
inet 172.25.243.253/22 brd 172.25.243.255 scope global dynamic noprefixroute eth1
valid_lft 43139sec preferred_lft 43139sec
....

 

In this output, notice the eth1. This NIC must be renamed before the upgrade can be started. You can follow the steps provided by Red Hat for this process or use the steps below to bind a network script to this NIC and manually configure its name.

[root@key-test-app ~]# ip link set eth1 down
[root@key-test-app ~]# ip link set eth1 name eth-alt
[root@key-test-app ~]# ip link set eth-alt up
[root@key-test-app ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth-alt
NAME="eth-alt"
DEVICE="eth-alt"
HWADDR="00:50:56:97:1a:93"
BOOTPROTO=dhcp
ONBOOT=yes
NM_CONTROLLED="yes"

 

Starting the OS Upgrade

It is now time to start the OS Upgrade. Click the link on the dashboard of the appliance manager, in the checkbox tick on the warning and upload the upgrade bundle. The upgrade will now start. This will take quite some time and the VM will reboot several times. Do not intervene during these reboots.

What if Something has Gone Wrong

If the upgrade process detects a problem in its initial phase, it will abort the script, but the upgrade step will remain running in the user interface. If the CPU usage drops to near zero for some time and the VM does not reboot, it is likely that something has gone wrong. To find the cause, logon via SSH and inspect the logs.

/var/log/osupgrade-check.log This log file contains information about the planning phase of the upgrade. Check this file to find if the upgrade detected incompatible kernel modules, misconfiguration or installed software that cannot be upgraded. If this file contains an error and none of the log files mentioned below are present, the upgrade was aborted before it started. You must first address the error and can then restart the upgrade via docker restart tkh-admin. Restarting the appliance manager will automatically restart the upgrade.

/var/log/osupgrade.log This log file contains information about the upgrade itself. Once this file exists, your system will no longer be operational until the upgrade has finished. This is because the first step of the upgrade will be removing some important but also incompatible software packages. When an error is recorded in this file, you will have to go back to your VM snapshot, address the error and restart the process.

/var/log/osupgrade-cleanup.log This log file contains information about some cleanup being done after the upgrade has been performed. If any of these steps are causing problems, you can address the error and run /opt/topicus/tkh-post-osupgrade again manually.