XML Tutorial What you need to know about XML

Author: Jaidev

The facets of XML

There are two aspects of XML that are relevant to study and these are shown below. The first is a programming language independent study of XML while the other requires knowledge of an XML capable/enabled programming language such as C#, Java, PHP etc: XML as we have seen in the last two chapters is a specification that is a leaner and more streamlined derivative of SGML. Apart from XML itself, there are a number of closely related XML specifications as well which constitute the XML family. These lend functional power in a standardized way to the largely structural XML representation. If one needs to extend the capabilities of an XML framework, then one would require to actually program with XML. For instance, for standard web development actual XML programming is not required. However if one were to write a transformation engine for an embedded system, yes, XML programming is required. Finally there are the actual XML compliant languages for specific industry verticals. These are really not software related topics. Rather more domain specific knowledge. Let us look at these branches in more detail.

The XML “knowledge tree” is shown below. There are three categories – Family of Specifications, Programming Facets and Domain Markups. Of these categories, this tutorial focuses on the XML family of specifications, although we do make references to programming aspects when required. This tutorial also uses some common domain markups for purposes of illustration and examples. Once you have a solid foundation in XML specifications, your capacity to use and extend these specifications programmatically becomes far simpler and very effective. Figure 1: XML Knowledge Tree

XML Family of Specifications

  1. XML Specification – This is the very core XML specification available at http://www.w3.org/TR/REC-xml/ This specification covers documents, structures and conformance.
  2. XML Schema – XML Schema are a more advanced yet streamlined version of the classic DTD that we were introduced to in the last chapter. The latest documents are available in three parts here http://www.w3.org/TR/xmlschema-0/ http://www.w3.org/TR/xmlschema-1/ http://www.w3.org/TR/xmlschema-2/
  3. XSLT Specification – The XSL (eXtensible Stylesheet Language) Transformation Specifications pertain to the capacity of transforming XML documents from one format to another. While the XSL specification itself is available here http://www.w3.org/TR/xsl/ the XSLT language specification is available the following location http://www.w3.org/TR/xslt
  4. XHTML Specification – This is the Extensible HyperText Markup Language which is a XML version of the HTML specification. The key differences between XHTML and HTML are: The specification is available here http://www.w3.org/TR/xhtml1/
  5. XPointer – The XML Pointer Framework has been setup to uniformly define the mechansim of addressing or pointing to the internal structures of XML resources using URI references. By references we mean things such as links, inclusions and resource descriptions. The basic specification is at http://www.w3.org/TR/xptr-framework/
  6. XLink – The XML Linking Language allows explicit associations between resources or parts of resources. These resources could be (for instance) files, images, documents, programs or query results. The associations aka links are represented by actual XLink-conforming XML elements. The XLink specification is available at http://www.w3.org/TR/xlink/
  7. XPath – This is a very important XML specification that allows the addressing the constituent parts of an XML document. For instance if we want to perform a certain operation on only a part of the XML document, we need to specify the lcation of this part. This is done using the XPath notation. XPath is essential while doing transformations or specifying pointers. XPath is available at http://www.w3.org/TR/xpath
  8. XQuery – XQuery is a language that facilitates the creation of queries on XML data using XML notation itself. The queried XML data may be native XML (XML files or XML databases) or may be extracted from non-XML data stores (such as native spreadsheets or standard RDBMSes) and converted to XML by middleware. The XML Query project homepage can be found at: http://www.w3.org/XML/Query while the XQuery language specification is available at http://www.w3.org/TR/xquery/
  9. XML (Communication) Protocol - XML Protocol deals with the way XML documents are communicated over a network making the semantics of lower layer communication largely transparent. The commonly accepted standard is the familiar Similar Object Access Protocol (SOAP) which can be found at: http://www.w3.org/TR/soap/ Most of the XML Protocol activity is now subsumed under the Web Services Activity Architecture Domain found at: http://www.w3.org/2002/ws/

Programming Facets




  1. Parsing - In order for an XML document to be used by an application it must be parsed into a usable structure. XML parsers aka XML processors provide access to the structure and content of XML documents. There are broadly two categories of parsers:
  2. Database Related – While discussing database related facets, it is good to remember the following natural components and relations in object oriented XML applications.

    Figure 2: Database / XML / Object Relationships Looking at the associations in some detail we have:

  3. Transformation Related – Transformation aspects refer to programming frameworks and APIs that allow the transformation of XML documents from one form to another. The transformations are typically done by layers or pipes of XSL Transforms applied to the source document in sequence. A consequence of transformation APIs is the capacity to do multichannel programming which allows a common content to be repurposed and sent to multiple endpoints by sensing the endpoints’ capabilities and automatically applying an appropriate transform.
  4. XML IDEs - There a number of XML editing platforms available. Basic stuff to look for in an XML IDE are:

Domain Markup Languages

Domain markup languages are XML compliant markup languages for certain domains. These purposes could be of two types: There a numerous such domain markups so you might need to pickup knowledge of some of these when you enter on a development assignment in that particular industry. Introduction is over! Now that we know what XML is, where it came from and what we need to know about it (including what will be covered in this tutorial), let us get down to the thick of things by looking at the basic concepts. Our base camp is done! From now on, in this tutorial we will perform an ascent on the trail of the “XML language and family of specifications”. That will be our ultimate goal.
Copyright© 2004-2006 Aleksey Nudelman