Migrating Django from MySQL to PostgreSQL the Easy WayJuly 18th, 2010
I recently moved NewsBlur from MySQL to PostgreSQL for a variety of reasons, but most of all I want to use connection pooling and database replication using Slony, and Postgres has a great track record and community. But all of my data was stored in MySQL and there is no super easy way to move from one database backend to another.
Luckily, since I was using the Django ORM, and with Django 1.2’s multi-db support, I can use Django’s serializers to move the data from MySQL’s format into JSON and then back into Postgres.
Unfortunately, If I were to use the command line, every single row of my models has to be loaded into memory. Issuing commands like this:
would take a long, long time, and it wouldn’t even work since I don’t have even close to enough memory to make that work.
Luckily, the dumpdata and loaddata management commands are actually just wrappers on the internal serializers in Django. I decided to iterate through my models and grab 500 rows at a time, serialize them and then immediately de-serialize them (so Django could move from database to database without complaining).
This assume that you have both databases setup in your
settings.py like so:
Note that I changed my default database to the Postgres database, because otherwise some management commands would still try to run on the default MySQL database. This is probably resolved and I didn’t do something right, but when I migrated, I changed Postgres to be the default database.
I just run the short script in the Django console and wait however long it takes. This script prints out which set it’s working on, so you can at least track the progress, which might take a long, long time, but is much less prone to crashing like dumpdata and loaddata.
A word of warning to those with large datasets. Instead of iterating straight through the table, see if you have a handier index already built on the table. I have a table with a million rows, but there are a few indices which can quickly find stories throughout the table, rather than having to order and offset the entire table by primary key. Adapt the following code to suit your needs, but notice that I use an index on the Feed column in the Story table.
This makes it much faster, since I only have to sort a few hundreds records rather than the entire Story table and its million rows.
Also of note is that while all of the data made it into the Postgres tables, the sequences (counts) were all off. Many were at 0. To remedy this easily, just use the count of the table itself and store it in the sequence table, like so:
I just made a quick text macro on the table names. This quickly set all of the sequences to their correct amounts.