Skip to main content

Section 3.13 Special Characters

An advantage of XML syntax is that very few characters are reserved for the language's use, and thus very few characters need to be escaped. Of course, there is always the need to escape the escape character.

The escape character for XML is the ampersand, &. The other dangerous character is the left angle bracket, the “less than,” <. Mostly to be symmetric, we also handle the right angle bracket, the “greater than,” >, similarly. Single and double quotation marks are used to delimit attributes, so are part of the XML specification, but do not present difficulties in narrative text.

In normal writing, always use the empty elements <ampersand/>, <less/>, and <greater/>. Inside of mathematics elements, or code for images written in , always use the pre-defined macros, \amp, \lt, \gt. In verbatim text (such as programs) always use the XML entities &amp;, &lt;, &gt;.

If you consistently follow the rules in the previous paragraph you will avoid a descent into escape-character hell and avoid a lot of head-scratching. In particular, you should have no need of the <![CDATA[ ]]> mechanism of XML, so just forget we even mentioned it.

Print and PDF output is generated via , which has a good many special characters. So to preserve conversion to this format, you should consistently use provided empty elements for these characters. Here are the characters and their corresponding elements.

# <hash/>
$ <dollar/>
% <percent/>
^ <circumflex/>
& <ampersand/>
_ <underscore/>
{ <lbrace/>
} <rbrace/>
~ <tilde/>
\ <backslash/>
Table 3.13.1 's reserved characters and their elements

There are some other empty elements, which are conveniences for certain characters, or sequences of characters, that are difficult or unusual in and also somewhat obscure as Unicode characters. Two examples are the copyright symbol, ©, and constructions like the abbreviation “e.g.” for exempli gratia. We will document these later.