What is: Unicode?

Unicode is one of those key elements of the internet that few people are aware of yet use every day. Unicode is a global standard for how to transmit and read text data. This standard is constantly evolving. As of June 2018, Unicode version 11.0 is the most recent version, containing over 137,000 separate characters written in more than 140 modern and ancient scripts.


In effect, Unicode is a list of all of the possible characters or letters that you can transmit with a computer. These include things like the letter “a” as well as symbols like the “@” mark used for email addresses. Unicode is designed to be read left to right (as for English-language text) and from right to left (for languages like Arabic).


The Unicode standard is used in just about every programming language and operating system, including WordPress.


Technically, there are different kinds of Unicode. The simplest version is UTF-8 which uses 8-bit blocks to identify a single letter or character. Approximately 90% of all websites on the internet use UTF-8.


Unicode only provides a reference for something called a “code point,” which is in the form of a number. This provides a unique identifier for that one letter or character but not how to display it in terms of size, shape, or font.


The software that you use and/or the computer’s operating system determines how a character or letter will appear on the screen, but Unicode allows for a precise identification of that character or letter so that it can be understood with precision by a different software application or computer operating system. In other words, the way your computer displays the letter “A” may be different than how another computer does it, but both use Unicode in order to be sure that they’re talking about the same letter.

