XML Tutorial Schema

by Jaidev

From DTD to Schema

 While closing the last chapter we studied a few drawbacks of DTDs:

* There is no formal datatyping scheme in DTDs. This means that there was no real way of specifying the type of content (unless you tried to actually build it into the document yourself which then made the document proprietary and wrote proprietary methods of handling the proprietary document. Not nice eh?!!).

* All definitions have a global scope. This implies that any definition holds good throughout and there is no way to limit this scope.

Apart from these two, there is another very important drawback of DTDs.

DTDs are themselves not written using standard XML document syntax. Since they were originally meant to support electronic publishing structures (under SGML), this was really not viewed as a drawback at that time. But with the evolution of XML in the late 90s, it soon became very difficult for standard XML parsers to work with DTDs.

The obvious solution was to create a structure definition language which:

* was richly typed,

* supported scope enabled definitions, and

* was defined using standard XML document syntax

 That came to be known as the XML Schema

XML Schema Related Specifications

Before we proceed to understand how XML schemas are written, let us look at some of the related XML Schema specifications, many of which are still very active and give a good insight into the evolution of the XML Schema specification.

* XML-Data (1998) – A very early successor to DTD and precursor of XML Schemas

* XDR (1998) – Stands for “XML Data Reduced” and is Microsoft’s subset of XML-Data. MSXML (Microsoft’s core XML support services) Versions 2.0 and above support XDR.

* DCD (1998) – Stands for “Document Content Descriptions” is yet another subset of XML-Data

* RDF (2000) – Stands for “Resource Description Framework”. It is a very generic metadata processing scheme.

A very good overview of these schemes and more is available at: http://www.xml.com/pub/a/2001/12/12/schemacompare.html and http://xml.coverpages.org/schemas.html#xml-data.

Now that we have briefly seen the related XML Schema specs, we can move on to understanding what actually goes into an XML schema..

Writing the Basic XML Schema

Let us jump right in, and see a barebones XML Schema:

Basic XML Schema

Of course, the first line as expected is the XML version and encoding specification!

Root

Following that, the text you see in red, is the root element opening and closing tags. We can now see that in contrast with the DTD specification, the Schema is a proper XML document structure since it has a root element and content (shown in blue) within.

The key things to note with the root element are:

* The element which is the name of any Schema root

* The namespace attribute “xs=http://www.w3.org/2001/XMLSchema” which states that the Schema namespace is derived from the 2001 W3 XML Schema specification.

Elements

In the above example we have also seen the equivalent of a DTD element specification. These are in the lines:

XML Schema Element

which represent the start and end tags of the element. Once again, notice the strong XML document syntax as opposed to the corresponding <!ELEMENT ….> definition of the DTD. In its simplest form the syntax of the element is:

<xs:element name="element_name" type="element_type"/>

where name refers to the name of the element and type refers to the type of the element. We will discuss type definitions in detail in the next chapter as “typing” is one of the very distinctive features of Schemas. For now let us continue with attribute definitions.

Attributes

XML Schema attributes are represented with the following syntax:

<xs:attribute name="attribute_name" type="attribute_type"/>

where name refers to the name of the attribute and type refers to the type of the attribute. For instance, in the example below the attribute name is “color” while the type is “string”. Apart from the above simple definition, there can be additional specifications on attributes. Just as we did with DTDs, we will study some of these by example.

 Specifier  Description Example
 optional  Optional attribute  <xs:attribute name="race" type="xs:string" use="optional"/>
 required  Required attribute  <xs:attribute name="first" type="xs:string" use="required"/>
 default and fixed  Default value and constant value  <xs:attribute name="domain" type="xs:string" default="com"/> <xs:attribute name="country" type="xs:string" default="US"/>
 minOccurs maxOccurs  Minimum and maximum number of occurances  <xs:attribute name="articleTitle" type="xs:string" minOccurs="1" maxOccurs="1" />

Annotating XML Schemas

Comments in XML Schemas are most commonly created using the “<xs:annotation>” and the “<xs:documentation>” elements.

The annotation should either be:

* Just after the <xs:schema ……> root element if we want to annotate the entire schema; or

* After just after the element definition if we want to annotate only that particular element

For instance, in the example below the definition of the element ARTICLE is annotated.

<xs:element name="ARTICLE">
    <xs:annotation>
        <xs:documentation>This is an annotation on the article element </xs:documentation>
    </xs:annotation>
</xs:element>

References to XML Schema

 Just as with the DTD, the XML Schema reference is positioned after the first line of the XML document and before the actual document content. For instance, in the XML document below the schema reference is shown in blue.

<?xml version="1.0" encoding="UTF-8"?>
<ARTICLES xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="D:\Articles.xsd">
<ARTICLE>
<ARTICLEDATA>
<TITLE/>
<AUTHOR/>
<ABSTRACT/>
<BODY/>
</ARTICLEDATA>
</ARTICLE>
</ARTICLES>
 

Once again, as with DTDs, the Schema can be either as a local file (as in the above example) or a URL. If available as a URL, the syntax would be the same but the file location would be something like http://www.mydomain.com/schemas/Articles.xsd.

Validation

In an earlier chapter we had learnt something about validation of XML documents. Let us do a recap.

Validation refers to the checking of the structure of an XML document against its master structure specification which by now we know comes in two forms - the DTD or the Schema.

If there is a reference to a DTD or Schema in an XML document as we have seen above, any program that does reads, understands and helps do something useful with an XML document can validate the document or choose to ignore it. Once again we have to qualify that such programs (typically called parsers!) may or may not have the capacity to validate an XML document. If they can validate, they are called validating parsers and if they cannot… well, they are called non-validating parsers!

Let us try to validate a Schema based document in an editor like XMLSpy which is built on a validating parser and see the output screens.

Notice that the validating parser found that the element “BOD” (mistyped instead of “BODY”) is not correct and reported the error.

Validation is particularly important when the destination of an XML document needs to know that the data being received is according to the standard structure it expects. However if validating is not important, it is often skipped as it can lead to perform degradation particularly in scenarios such as real time XML data processing.

Wait …. we haven’t finished yet !

In this chapter, we saw some of the reasons why schemas are particularly powerful when compared with DTDs. In particular we noticed that all the constructs of the Schema use XML document syntax making it very easy for XML parsers to deal with them. We looked at the structure of a Schema and how Elements and Attributes are specified and annotated. Finally, we learnt how schemas are referenced in an XML document and how such an XML document is validated against the referenced DTD or Schema (by using an example in an editor). The authoritative source on XML Schema specifications is here: http://www.w3.org/XML/Schema. We also mentioned that we would talk about data types in the next chapter and that is what we shall proceed to do now. Simple and Complex Types, as we will see, form the very core of Schemas. We will also briefly revisit the concept of Namespaces again with respect to the concept of scoping. Our ascent is going smooth… we are almost at Camp II!

Copyright© 2004-2006 Aleksey Nudelman