Mysql 4.0.x LATIN1 to Mysql 4.1.x UTF-8 Nightmares

Whew! Finally I have accomplished what I had been putting off for a while, namely a MySQL upgrade that had been due. Here’s the problem: ­What would normally be a straightforwa­rd LATIN1 or ­ISO-8859-1 to UTF-8 character conversion turns out NOT to be. Why? Well, just because the old database character set defaulted to LATIN1, and most things were stored in LATIN1, not everything was. This very website, powered by WordPress, has been storing data as UTF-8 since forever. So we have a problem. Can you see it? The database and its data are mostly LATIN1, but some data are not. An automated mass conversion would not be possible. See why I had been putting it off? The solution, if you can call it a solution, is to convert all the tables that are NOT WordPress first. mysqldump -h localhost -u username -p --all-databases -c dbname > dump.sql Then modify that file by cutting and pasting WordPress from the SQL file (or any other application that stores its data in UTF-8). Using vim you only have to place your cursor at the beginning of WordPress, where it says CREATE DATABASE wordpress, hit “ctrl-v” then “shift-v” and then “/” and type CREATE DATABASE. This has the result of selecting all text between the CREATE DATABASE statements. It’s extremely convenient for selecting possibly hundreds of megabytes of text. Next, hit the key “d” to delete the SQL statements pertaining to WordPress. WordPress is now stored in your buffer. You can save that file, then type “e” and a filename like “wordpress.sql”. Next hit “ctrl-p” to paste what is in your buffer (possibly hundreds of megabytes of data). Exit and save. You now have two files: one called dump.sql with all of your LATIN-1 data, and another called wordpress.sql with just your WordPress stuff. Now convert your dump.sql file to UTF-8 iconv -f ISO-8859-1 -t UTF-8 dump.sql > dump_utf8.sql In this case, we will start with a fresh install of Mysql (so we don’t have to drop any databases). Import like this (because we used –all-databases in the dump, all the DROP and CREATE statements are included, so no need to manually create the databases): mysql -u username --max_allowed_packet=16M -p --default-character-set=utf8 dbname < dump_utf8.sql­ ­ I assume you have backed up your old installation, just in case this whole procedure goes south.­ In my case, keyword changes between MySQL 4.0.x to MySQL 4.1.x resulted in various import errors. I have to go through the SQL dump and alter the database schemas by hand. Be advised. It was thorny, but I write this little tutorial to let you know that it is possible. I received no less than 20 different errors in my database import. I altered them one by one, wiped, and re-imported. Eventually, my database was ticking along just fine. It can be done. Next you just have to import your wordpress.sql file as it was, and you are done. Of course, MySQL 5.x is out and 4.1.x is screaming to be upgraded, so I’ll now have to wade knee deep into the thicket of thorns again. This exercise above, has prepared me for the pain though, I think I’ll be able to handle it. Actually, 4.1.x to 5.x wasn’t bad.  No problems.  Just follow the standard upgrade guide.  Make sure to do a full dump and just import everything as specified.  Since we were already on UTF-8, there weren’t any gotcha on my end.

You may also like...