XML - eXtensible Markup Language
What is XML and why use it?
XML is a tag based format as HTML, but it describes the content rather
than the presentation of that content.
Tags may have attributes also, and that sounds as objects in programming
languages. Actually, XML may be used to serialize classes from programming
languages.
As its name suggests, XML is extensible and it serves as the basis for many descriptive languages as discussed below.
XML is more and more used as the format for documents, and it is now the file format for Office and LibreOffice.
The first recommendation by the W3C, XML 1.0 dated 16 August 2006, but its history goes back to 1996 as indicated in the history from the W3C.
Advantages of XML
Any type of data may be described by XML providing there is a grammar
of the structure (the tags). Its universality allows to use it in any context and system.
Its tree structure and extensibility allows to describe anything. It is so easy to parse a document by a script while it remain easy to read by human.
As the tags contain raw data, it is easy to perform searches on them.
Defining the grammar
Before to write an XML document, you should write a Document Type
Declaration. A DTD declares a grammar of tags, and an XML document is
an instance of that grammar as an object is an instance of a class,
or as a program for a language.
The DTD may be included into the XML document, or linked by an URL (web
address).
Without the DTD, the XML document may be used but not checked for validity. A document is validated with:
- DCD (Document Content Description for XML).
DCD is a language that provides a structural schema facility (using XML syntax), which replaces the functions of the DTD to describe constraints on tags and content of XML documents. Additionally it also describes datatypes and relationships in databases.
DCD incorporates a subset of XML-Data, and is an RDF vocabulary. - A schema, as
a DTD, describes the grammar of tags for validating XML documents.
Schemas have the same syntax that XML (DTD use a different one), allow custom datatypes and have lot or predefined ones.
Valid format for an XML document
An XML document has the following form:
<?xml version="1.0" encoding="utf-8" ?> <xml> <tag attribute="value" attribute="value.." ... actual content ... </tag> </xml>
The file begins with a descriptor. This is optional, but it depends on the tools, they may require these informations of version and encoding.
A root tag encompasses all others. It is here called <xml>, but it is only an example. Any other name is possible.
The root tag can include other tags that can either embrace other tags or contain a text. Each can have attributes in the form name = "value".
If a tag does not have content, it can be written in short form without using the name to close it:
<tab attribute="value" />
Many other formalisms can be added to this basic structure, but it is sufficient for most documents.
How to use XML
To use an XML document, you need for a parser. Several kinds of parsers
exist.
The parser may translate the document into a tree in memory, accessible
through to the Document Object Model (DOM). But you can also associate
functions to tags, in the "sax" way.
Implementations of parsers are available for all programming languages.
XML, as HTML, has its stylesheet, named XSL that provides rules to transform
XML into another format (XHTML for example).
Tools has been standardized (by the W3C) to access XML documents.
- XLink (XML
Linking Language)
A language (that uses XML syntax), that can be inserted into XML resources to describe unidirectional hyperlinks (like HTML) or more complex multi-ended and typed links. Xlink may be used with XPointer. - XPath defines how to reach specific elements.
- XPointer (XML
Pointer Language)
A language that can address selected elements in the structure of XML documents. It is based on the XPath language. - DOM (Document
Object Model)
API to dynamically access and change either the content or the structure of XML documents. It is a platform and language independent interface.
Applications beyond data storage
The power of XML goes beyond simple data storage, is is also a langage of interface of applications.
- XAML is a format similar to XUL for the .NET platform and is used by the Modern UI in Windows.
- XHTML a format of HTML compatible with XML.
- HTML 5, the format of Web documents now a language for interface for web applications.
It is also possible to turn XML into executable by inserting tags that are recognized as instructions by a special parser.
Extensions and languages based on XML
XSL (eXtensible
Stylesheet Language)
An XSL document is a set of transformation rules, allowing to map structures
with choosen elements and attributes in XML documents.
A set of rules for translating an XML document into HTML is the best example
of an XSLl, but we can translate XML into anything.
The difference between XSL and XSLT is that XSL produces any format while XSLT
convert an XML document to another XML document.
SVG (Scalable Vector
Graphics)
An API that describes graphical objects, that can be dynamically interfaced
with JavaScript to make animations. A SVG document currently may be displayed
as a web page providing the browser has the correct plug-in.
SMIL (Synchronized Multimedia Integration Language)
Multimedia language that combines data from various sources, to make animations.
XQuery. (XML Query)
Specification to turn XML documents into databases.
XHTML (XML HTML 4.0)
HTML rewriten in XML,
with an associated DTD.
XForms (XML Forms)
Defining forms.
RDF (Ressource Definition Format)
Standard to describe various date including images. It adds a description of the structure to that of data.
RSS
It is a set of formats for syndication, RSS 1.0 being defined in RDF, et RSS 2.0 en XML as is Atom.
XML Oriented language
Scriptol is a programming language which has a syntax similar to XML and embed XML in the source.
Schema. Standard format to validate any XML document.
Tools
XML parsers
- Xerces the XML parser. Implements DOM, SAX and schema. (C++, Java).
- LibXML Parser in tree.
- Expat Parser in events.
Other tools
- Xalan implements XSLT and XPath to transform XML into HTML (C++, Java).
- Cocoon is an XML server.(Java).
- Majix Converts a RTF file into XML.
- XCheck Download a checker for well-formed document.
W3C Specifications
- XML 1.0. Recommendation.