How to Fix the Collation and Character Set of a MySQL Database

Sources: mostly here and not so much here.

Something about MySQL being swedish and having a default character set of latin1 and a collation of latin1_swedish_ci which is no good if you need to save utf8 data.

What is Collation?

The collation determines how results are sorted and ordered. It may be that an application requires a certain collation. You must ensure your database has the correct collation for the application it will be used with.

Collation in MySQL can be complicated because you can have a separate collation set at:

  1. The database level
  2. The table level
  3. The column level

Additionally, the information inside a column may be encoded incorrectly as well – causing the data in that column to be displayed incorrectly.

Dealing with Foreign Key constraints

It may be necessary to ignore foreign key constraints when making changes to a large number of columns. You can use the SET FOREIGN_KEY_CHECKS command to ignore foreign key constraints while you update the database.

SET FOREIGN_KEY_CHECKS=0;
 
-- Insert your other SQL Queries here...
 
SET FOREIGN_KEY_CHECKS=1;

Changing the database collation

In the example below, change: 

  • <yourDB> to your actual database name
  • <charset> to either utf8 or utf8mb4
  • <collation> to either utf8_bin or utf8mb4_bin

To change the database collation:

ALTER DATABASE <yourDB> CHARACTER SET <charset> COLLATE <collation>

Changing table collation

The following query will produce a series of ALTER TABLE statements, which you must then run against your database. 

In the example below, change: 

  • <yourDB> to your actual database name
  • <charset> to either utf8 or utf8mb4
  • <collation> to either utf8_bin or utf8mb4_bin

To change table collation:

SELECT CONCAT('ALTER TABLE `',  table_name, '` CHARACTER SET <charset> COLLATE <collation>;')
FROM information_schema.TABLES AS T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` AS C
WHERE C.collation_name = T.table_collation
AND T.table_schema = '<yourDB>'
AND
(
    C.CHARACTER_SET_NAME != '<charset>'
    OR
    C.COLLATION_NAME != '<collation>'
);

Changing column collation

The following queries (one for varchar columns, and one for non-varchar columns) will produce a series of ALTER TABLE statements, which you must then run against your database. 

In the examples below, change: 

  • <yourDB> to your actual database name
  • <charset> to either utf8 or utf8mb4
  • <collation> to either utf8_bin or utf8mb4_bin

To change column collation for varchar columns:

SELECT CONCAT('ALTER TABLE `', table_name, '` MODIFY `', column_name, '` ', DATA_TYPE, '(', CHARACTER_MAXIMUM_LENGTH, ') CHARACTER SET <charset> COLLATE <collation>', (CASE WHEN IS_NULLABLE = 'NO' THEN ' NOT NULL' ELSE '' END), ';')
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = '<yourDB>'
AND DATA_TYPE = 'varchar'
AND
(
    CHARACTER_SET_NAME != '<charset>'
    OR
    COLLATION_NAME != '<collation>'
);

To change column collation for non-varchar columns:

SELECT CONCAT('ALTER TABLE `', table_name, '` MODIFY `', column_name, '` ', DATA_TYPE, ' CHARACTER SET <charset> COLLATE <collation>', (CASE WHEN IS_NULLABLE = 'NO' THEN ' NOT NULL' ELSE '' END), ';')
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = '<yourDB>'
AND DATA_TYPE != 'varchar'
AND
(
    CHARACTER_SET_NAME != '<charset>'
    OR
    COLLATION_NAME != '<collation>'
);

Warning by My Brain:

When you need to actually do this, try this on a test database first!
I had about 100 queries to run on a 1Gb database and that took my development server about 6 hours…. Next step for me is figuring out how to roll this out on the live server.

My queries were like these:

SET FOREIGN_KEY_CHECKS=0;
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE `tablename` CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE `tablename` MODIFY `fieldnamevarchar` VARCHAR(64) CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE `tablename` MODIFY `fieldnametext` TEXT CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
SET FOREIGN_KEY_CHECKS=1;

And the solution by My Brain!
Instead of changing one field per query I merged this into one query per table which enormously improved the speed. Like so:

SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE `tablename`
  MODIFY `fieldnamevarchar` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci,
  MODIFY `fieldnametext` text CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  CHARSET=utf8, COLLATE=utf8_general_ci;
SET FOREIGN_KEY_CHECKS=1;

This took about 6 minutes on the still-ever-so-slow-dev-server and just under 1 minute 45 seconds on the live server …. a more acceptable down-time for maintenance.

Have a nice day!