This document defines a subset of XML called canonical XML. The intended use of canonical XML is in testing XML processors, as a representation of the result of parsing an XML document.
Every well-formed XML document has a unique structurally equivalent canonical XML document. Two structurally equivalent XML documents have a byte-for-byte identical canonical XML document. Canonicalizing an XML document requires only information that an XML processor is required to make available to an application.
A canonical XML document conforms to the following grammar:
CanonXML ::= Pi* element Pi* element ::= Stag (Datachar | Pi | element)* Etag Stag ::= '<' Name Atts '>' Etag ::= '</' Name '>' Pi ::= '<?' Name ' ' (((Char - S) Char*)? - (Char* '?>' Char*)) '?>' Atts ::= (' ' Name '=' '"' Datachar* '"')* Datachar ::= '&' | '<' | '>' | '"' | '	'| ' '| ' ' | (Char - ('&' | '<' | '>' | '"' | #x9 | #xA | #xD)) Name ::= (see XML spec) Char ::= (see XML spec) S ::= (see XML spec)
Attributes are in lexicographical order (in Unicode bit order).
A canonical XML document is encoded in UTF-8.
Ignorable white space is considered significant and is treated equivalently to data.
James Clark