Displaying Unicode characters from supplementary planes in Windows®

back arrow icon

assembly date

2009, July 31.

author(s)

Balla Marcell

keywords

  • unicode
  • fonts
  • windows®

The information in this document is based on an i18nguy article. However the information provided here goes further by describing how to actually create and display supplementary unicode characters.

Let us begin with some basics. Unicode is a standard that describes how characters from different languages have to pe represented. This standard is necessary for multi-lingual applications to work consistently. Unicode uses a maximum of 4 bytes (32 bits) to store a single character. This enables users to store 2 to the 32nd power number of unique characters (latin, thai, chinese, etc.).

The Unicode standard offers three main formats on storing the characters. These are: UTF-8, UTF-16 and UTF-32. While UTF-8 is the most widely adopted format the others are used as well. Microsoft has decided to incorporate the UTF-16 standard into its operating system: Windows®. This is therefore probably the most popular example for using UTF-16. This article shows how to display any Unicode character on Windows® using UTF-16.

What are supplementary planes and surrogates lately? This question is easily answered. The UTF-16 format stores each character using 16 bits. But since there is twice the amount of possible characters available it is necessary to store these as well. To accomplish this characters in the range above 0x10000 (the supplementary planes) are stored in two pairs of 16 bits, called the surrogate pairs.

The Unicode standard describes how these pairs of 16 bits have to be formatted. This is mainly about shifting and setting bits and you don't ever have to bother about this youself - except if you are a programmer and write your own Unicode converter class. The Unicode Standard, Version 5.0 book describes everything there is to know about how to convert and format Unicode characters.

But the goal of this article is merely to teach you how to get Windows® to display supplementary characters. Begin with the following steps:

  1. Ensure that "usp10.dll" is located under %WINDIR%\system32
  2. Open the registry editor and search for the following key: [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback]
  3. Add it if it is missing
  4. Also add a string called "Plane1" within this key
  5. Set the value of this string to the font which contains "supplementary characters" (see note below)

Note: If you do not have any font that contains characters from the "supplementary planes" (which is very likely up to the writing of this guide) you will have to add them yourself. Read on to find out how you can do this. I found that "Arial Unicode MS" is the most complete font yet.

Check the related links section for character charts. Follow these steps to add characters from the supplementary planes to a font of your choice:

  1. Install FontCreator. (Check the related links section above)
  2. Open the font file (.ttf) you want to add characters to using FontCreator. (See note below)
  3. Select "Insert->Characters..." and answer the warning with "Yes".
  4. Now you will see a dialog similar to Figure 1. (That figure shows the results of the following steps. So be patient and keep on reading further.)
  5. Set your cursor into the "Go to Code Point" textfield, enter the code of the character you want to add (e.g. $10302) and hit the "Go" button.
  6. The "Selected Character" textfield should now containt the official Unicode description for the character which corresponds to the entered code point.
  7. There is no preview available on the left because the font does not contain this character. We will add it now.
  8. Click "Add" to add this character to the font.
  9. Close the dialog with pressing "OK" and go to the bottom of the list of glyphs where your newly added character should be located (The rectangle is yet empty. Figure 2 shows the main window with the rectangle already containing the character's shape.)
  10. Double clicking the rectangle lets you create the shape of the character. The easiest way to create a character is to use the "Insert contour" tool and draw a closed outline of the shape.
  11. After your character is ready click the floppy symbol in the toolbar to save it. (It can take a while to save data for big font files like "Arial Unicode MS".)
  12. The last thing to do is to copy the extended font into the directory of registered fonts under Windows®. (See note below)

Note: If you want to update a registered system font simply enter "fonts" under "Start->Run..." and drag the font icon into an explorer window. After changing the font drag it back into the same window.

Adding characters in the FontCreator application Figure 1: Adding characters in the FontCreator application

List of glyphs in FontCreator Figure 2: List of glyphs in FontCreator showing the newly added character

Applications that make use of Windows® controls (including Edit and RichEdit fields) automatically support supplementary characters if they are handled as defined by the Unicode standard (surrogate pairs). For the example character U+10302 (OLD ITALIC LETTER KE) the surrogate pairs are 0xD800 and 0xDF02. Check the related links section for an MSDN article containing more information about supporting surrogates and supplementary characters under Windows®.

Figure 3 shows a Windows® MFC application with a grid containing CEdit textfields. Look at the (highlighted) contents of the last column in the first row. It contains the newly added character U+10302 from the supplementary plane preceeded by the letter 'c'.

MFC application showing a Unicode character from the supplementary plane Figure 3: MFC application with a grid containing a Unicode character from the supplementary plane

Summary: This article showed you how to add missing characters from the supplementary plain to the font of your choice. This is necessary for example when testing Windows® applications that have to deal with the full range of Unicode characters.

back arrow icon