Automated In-Place PostgreSQL Upgrade with Zero Downtime

Written by Sven Haster | 25/04/2025

Earlier this year, it was announced that PostgreSQL 12 would reach its End-of-Life. At Topicus KeyHub, our goal is to always run supported versions of the software we use. This meant that a migration from PostgreSQL 12 to PostgreSQL 16 was necessary.

Last summer, we collaborated with the Infra team of Topicus Education to plan an in-place upgrade with less than a second of downtime in a high-availability (HA) cluster. This automated approach is reusable for future upgrades, allowing us to easily stay up-to-date with new PostgreSQL versions.

The Challenges of a Major PostgreSQL Upgrade

At Topicus KeyHub, one of our core principles is that all components must run at least on a supported version, and preferably the latest version. This also applies to our database.

However, performing a major PostgreSQL upgrade is not a simple task—especially not in an HA cluster where minimal downtime is crucial. A typical way to upgrade a database is to take it fully offline, perform the migration, and then bring the new version online.

This method, however, can easily result in downtime of an hour or more, which is unacceptable in an HA setup. In such an environment, we aim for zero or just a few milliseconds of downtime. Furthermore, because Topicus KeyHub runs on customer-owned hardware, the entire upgrade must be automated and performed in-place, as we do not have direct access.

Streaming PostgreSQL 12 Database to PostgreSQL 16 Using Logical Replication

Within Topicus Education, the databases of ParnasSys and Somtoday also required upgrades. In collaboration with the Infra team, we developed a plan based on logical replication. By using logical replication, we were able to stream data from a PostgreSQL 12 database to a PostgreSQL 16 database in real-time.

Standard replication uses a binary protocol, which is more efficient but incompatible between major versions. Logical replication, on the other hand, replicates data using SQL statements. This makes it possible to synchronize data between databases of different versions.

Once the replication was set up, all that remained was to switch the connections from the old database to the new one.

How to Achieve a Zero-Downtime Upgrade in an HA Cluster

For the upgrade in an HA cluster, we first took one of the replicas offline, noting up to which point on the timeline it had received data. This database was then upgraded from PostgreSQL 12 to 16 using the standard PostgreSQL upgrade tooling (“pg_upgrade”). After that, the replica was brought back online, and its statistics and indexes were refreshed.

Logical Replication to the Upgraded Node

At this point, we had a replica database upgraded in the usual way and were ready to use logical replication to minimize downtime in the cluster.

We started logical replication from the primary database (still running version 12) to the PostgreSQL 16 database, beginning from the point where the replica had last been synchronized before the upgrade. Meanwhile, users could continue working, so all changes needed to be replicated to the newly upgraded database.

Upgraded Node Becomes Primary

Once the replication caught up, we redirected all connections to the upgraded replica, effectively making it the new primary database. This is the only moment where users could theoretically experience a brief interruption, but the switch happens so quickly that it’s unlikely any user would even notice.

Updating Remaining Nodes

Within milliseconds, the connections had been switched, and KeyHub was running on PostgreSQL 16. The final step was to copy the other two databases in the cluster from the new primary database. These then stream data via the usual binary protocol. This process works similarly to adding a new server to the cluster.

Seamless Upgrades to New PostgreSQL Versions

With this approach, we’ve developed a flexible and repeatable solution for upgrading PostgreSQL databases. Thanks to logical replication, we can minimize downtime—even in complex HA environments.

Going forward, we will continue to use this method to keep our databases up to date. This ensures that Topicus KeyHub continues to run on a reliable and modern infrastructure, without any disruption to our users.

View full post