In this article, we’ll cover the basics of what a Character Set and a Collation are and provide simple examples. This can easy your life with multilingual websites, help you fix issues when transferring a site from one server to another and display your text the right way. Easy to follow Database Character Set Encoding and Collation Guide.
A Character Set basically represents a map of Characters and their Encoding. We could visualize the Encoding being a Number like:
A – 0; B – 1; C – 2; D – 3; a – 4; b – 5; c – 6; d – 7
A collation is a set of rules that allow the Characters and their Encoding to interact with one another, for example being compared – comparing A to B using their above Character Set we’ll get:
A – 0 < B – 1
B is bigger than A as it’s Character Set is bigger ( 0 > 1). It could also be used when making Case Insensitive comparison.
Recommended Database Character Set Encoding and Collation
It is recommended that all Databases are created using utf8 as Character Set and utf8_general_ci as Encoding, a sample CREATE command would be:
CREATE DATABASE IF NOT EXISTS `database_name` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci