Skip to main content
\( \newcommand{\lt}{<} \newcommand{\gt}{>} \newcommand{\amp}{&} \)

Section6.22Unicode Characters

PreTeXt supports (and encourages) the use of Unicode characters. Here are some relevant comments.

  • Unicode characters will migrate well to any output format based on HTML. Most browsers will have a variety of fonts with glyphs to realize these characters.

  • will not always behave as smoothly. For openers, you definitely will want to use the xelatex engine to build a PDF. Then you need to be sure your system has a font with the necessary characters and you make the font known to xelatex. We are working out the details of the best way to accomplish this.

  • How do you get a Unicode character into your source? In part this is specific to your operating system and editor, so is outside the scope of this guide, but we have hints below for popular operating systems.

  • You can always place a Unicode character in your source using XML syntax. The first thing an XML parser will do is convert this syntax into a character. The number of the SECTION SIGN in hexadecimal is A7, so the syntax &#xA7; is identical to the character §. Of course, this will get tedious fast.

  • The Full Unicode Input utility at www.cs.tut.fi/~jkorpela/fui.html8 will allow you to specify a chunk of 256 consecutive Unicode numbers and then you can click on characters to make a string of several or many. You can cut/paste these into your source, or convert the whole lot to XML syntax all at once.

  • Unicode characters have standardized names. You can find these, and more information, including font support, at the Unicode section of FileFormat.info, www.fileformat.info/info/unicode/. If you are struggling to find a specific character, then using this site's name in a search will often quickly locate what you need. Be sure to experiment with the test pages there for browser and font support (including checking your local configuration).

  • Warning: do not use Unicode characters as a way to get mathematical symbols (that is delegated to our use of syntax). And do not use Unicode when we have provided an empty element for a character, especially when that character may be used in a markup syntax for some output, such as , HTML, JSON, Markdown, …

    For example, if you put many naked hash symbols (#) in your source, then you will get nice HTML, but when you try to get print from a PDF from you will have a train wreck on your hands when you compile the . Instead, be sure to always use the provided <hash /> element. Always. Other empty elements are conveniences, which spare you from looking up Unicode numbers and make your source more readable, rather than a necessity to avoid special characters. An example is <times />, for use outside of a strictly mathematical setting: “I bought a 2×4 at the lumberyard.”

Subsection6.22.1Unicode Support in OSX

Mitch Keller reports on 2017-01-12 a way to get some popular characters with OSX. Use the Keyboard preference pane under System Preferences. In there, you can enable

Show Keyboard, Emoji, & Symbols Viewers in menu bar

Once you activate the keyboard viewer, you get a keyboard on your screen. When you hold down opt, it shows you what other symbol you would get if you push opt+letter. For instance, opt+w gives an upper-case Greek sigma and opt+= gives a not-equals sign (neither of which we can handle when processing the latex version of this guide). To get ä, you type opt+u and then hit a. This is illustrated by the keys for diacritical marks being highlighted in orange while holding opt. The shift key can have an effect to produce variations of some characters, such as quote marks (dumb versus smart).

Subsection6.22.2(*) Unicode Support in Linux

Subsection6.22.3(*) Unicode Support in Windows