In this post, you will learn what we've learned when migrating a database from Aurora Serverless (mysql 5.6) to AWS Aurora (mysql 8, provisioned).
Our database was about 100Gb big, indices included.
- How to migrate
- How to prepare your application for a replicated database cluster
- RDS Instance size and costs
- Would we do it again?
How to migrate from Aurora 2 (mysql5.6) serverless to Aurora 3 (mysql8) provisioned
You will not be able to use AWS DMS properly with Aurora V1.
The quickest way to accomplish the migration was by creating a snapshot and restoring it in a new cluster.
Migrate your Aurora serverless v1 (mysql5.6) database to Aurora v3 provisioned (mysql8) following these steps.
- 2 minutes: Take a snapshot of your serverless V1 database.
- 30 minutes: Restore the snapshot into a provisioned cluster. We choose a db.r5.large instance.
- 14 minutes: Upgrade the new provisioned cluster from aurora mysql 5.6 to aurora mysql 5.7
- 16 minutes: Upgrade aurora 2.11.0 (mysql 5.7) to aurora 3.02.1 (mysql8)
- Change the database endpoints in your application to the new endpoints. You can also use DNS CNames with low TTL to switch quickly back and forth, might something be wrong.
So this way you can have a new cluster up and running in limited time with a 100Gb dataset.
How to prepare your application
Aurora serverless V1 only has a single database endpoint. However, with Aurora provisioned, we can deploy multiple read only instances, and have a single writer instance.
So Aurora provides us by default with a writer endpoint and a reader endpoint. If you have no replica's these endpoints wil direct to the same db instance, the writer.
Perhaps your application framework supports writer and reader endpoints more or less out of the box, as was the case with us (Using PHP Symfony framework with DBAL/Doctrine).
So, we prepared the application as follows.
Before the migration:
- We set up 2 CNames ('writer' and 'reader') to point to our pre-migration database endpoint - since AWS Serverless v1 only has one endpoint, the two CNames point to the same endpoint. Set the TTL low.
- Configure the application configuration to use replicated databases and provide the 'reader' and 'writer' cnames.
We can now modify to the reader and writer CName to point to another database if we want.
- Now create a new cluster as described above from a database snapshot. The new Aurora cluster will have a reader endpoint and a writer endpoint (i.e. 'aurora-writer' and 'aurora-reader')
- Modify the 'reader' C-Name to point to the 'aurora-reader' endpoint, and the 'writer' C-Name to the 'aurora-writer' endpoint.
Now your application uses the distinct reader and writer endpoints of your new Aurora cluster. If you haven't added replica's yet, these will both resolve to the writer.
If you did add replica's, you will use the reader for read operations and the writer for write operations.
Thoughts about your application with a replicated database
Whenever you write (update, insert or DML statements) to the writer, it takes some time for the changes to propagate to the replica's - this is known as replication lag (for Aurora, for replica's within the same region, this is typically between 5 - 20 ms).
So, after your application writes to the database, it should not use the reader endpoints anymore for reads since these may contain stale data.
Your framework is probably configured this way:
- Use the reader endpoint by default
- After the application writes to the database, use the writer connection exclusively
- Writes include starting a transaction, modifying tables, executing update/insert statements etc.
Because after a db write, your connection switches to the single writer instance, you must try to execute writes as late as possible because otherwise the advantages of having a read only replica will have disappeared:
- Do not use the same database connection for sessions, use a separate connection if you need to store sessions in the database
- Before writing to the database, try to read as much information from the database as possible
Our application consists of a 'web' part connected to a load balancer, and multiple workers that mostly import and manipulate data in the database (think of these as 'cron jobs').
Since the purpose of these workers is to manipulate data in the database, we configured the application so that the worker processes exclusively use the writer endpoint and never the reader endpoint.
This prevents several issues related to replica lag, and each worker will have the most recent data to act upon.
RDS Instance size and costs
We were a bit concerned about the costs for the memory and cpu reservations of the new Aurora cluster.
Because we migrated to mysql 8, we lost the query cache.
Our serverless v1 stats showed it had a hit percentage of 70%, so dropping the cache could seriously impact the resources needed for our database. On the other hand, the query cache is known to negatively affect performance for servers with high concurrency loads (as is the case).
Determining the expected requirements for our RDS instances
First, we 'translate' the serverless ACU consumption to a rough indication of the memory and cpu available in our serverless cluster.
Our serverless database consumed about 32 ACU to function properly. 1 ACU equals roughly 2Gb of memory, and as an educated guess 0.5 vCPU.
So our initial minimal estimate of our new cluster was 64Gb of memory and 16vCPU's.
We then looked at how much bang we can get for provisioned instances.
For about the same price, it turned out we could get about 3 db.r6g instances of 4 cpu cores and 32gb memory each - so that would be 12 cpu cores and 96Gb of memory in total. Note that these are ARM cores, which are real cores, as opposed to hyper threaded cores for intel machines. This should do great since memory would be our most needed requirement.
So we started off with 3 of these instances, one writer and two replica's.
The new cluster in production
This setup was proven to be more than adequate, our 'writer' instance is under quite a predictable load of between 20-40% cpu, and our readers about 40%.
More importantly, our serverless cluster experienced mysterious spikes in cpu and drops in performance quite frequently. These spikes have all disappeared in our cluster.
As a follow up, we decided to drop one reader and add autoscaling readers. So we have one steady writer and reader, and additional readers once the cpu of the single reader reaches a treshold (70% in our case).
This makes our new cluster effectively cheaper than serverless v1, with more capacity available.
When autoscaling kicks in it takes about 10-15 minutes before a new reader instance is launched and active. Since our scaling threshold was at 70% (allowing for roughly 0.4 times more load) this may sound a little tight. But since we did not have spikes mysteriously pushing up the cpu sometimes by 30-40%, as in serverless v1, it works more than smoothly in practice.
When our sites get busy, be it due to the time of day, or due to outgoing notifications and campaigns, autoscaling kicks in soon enough to have the extra capacity available when it is actually needed.
Would we do it again?
Definitely yes. Multiple headaches were solved by the migration:
- Our peak capacity is much higher for the same price
- RDS costs dropped on a monthly basis
- We got rid of mysterious peaks adding sudden bursts of 30-40% cpu on serverless v1
All of the following contributed to these benefits:
- the mysql 5.6 -> mysql 8 engine upgrade
- moving from serverless to provisioned
- the aurora version upgrade
- employing a dedicated writer and replicated readers