
XML Tutorial SGML, HTML and XML
Author: Jaidev
Concise History of SGML, HTML and XML
The evolution of XML due to the differing challenges of the times and XML’s temporal relationship to SGML and HTML is depicted in the figure below.

Figure 1: Evolution of XML
This chapter will describe each of these markup languages/standards in sufficient detail so we have a firm understanding of the place of each in the information age. In particular, we will see how SGML has acted as a starting point for both HTML and XML and how HTML and XML seek to address two very different needs – presentation and representation respectively.
SGML: Standardizing Electronic Publishing
Electronic Publishing
"Electronic Publishing" is defined as the process of disseminating textual and/or multimedia information via the electronic network (internet, local networks etc) or through standalone electronic media (CD, DVD etc).
When publishing documents electronically two factors are critical:
- The capacity to physically communicate with the channel (protocols and media formats)
- The capacity to understand the structure (parts and relationships) and
presentation (fonts, typefaces etc) of the content.
SGML or the Standard Generalized Markup Language sought to give a formal
standard for the latter factor – structure and presentation. As explained in the previous chapter, SGML was created under the aegis of the International Standards Organization (ISO) in 1986. While it must be said, SGML has never been immensely popular, it has served as the foundation of more widely accepted and extensively used standards down the years such as HTML and XML.
What is SGML? It is a standard language to represent document structures but is not a document structure itself. The actual document structures are defined by various industry verticals who need to standardize their electronic documents.
For instance,
- the web community created HTML based on SGML for communicating with web pages. Similarly,
- OASIS (Organization for the Advancement of Structured Information Standards) created the DocBook
standard based on SGML for communicating with technical manuscripts, and
- WAPFORUM defined WML as the standard for sending documents to wireless devices.
So how is a document structure represented in a standard way?
SGML is a Standard
Language to represent Document Structures. It is not a
Document Structure itself
The Document Type Definition (aka DTD)
The DTD or the Document Type Definition was the single most important contribution of SGML. While DTDs will be discussed more in detail later in Chapter 7, let us see by means of an example what an SGML document actually looks like. We consider a very basic “article” document which we will use an example throughout this chapter.

In the example above, we have two portions.
- The data portion of the document between the lines marked and ending , both lines inclusive. All content between the angular brackets < and > are known as “tags” and they “markup” the actual content. For instance the tag marks up the content “You and me”.
- The structure portion of the document is defined by the portion from the first line starting ]>
The structure portion is also commonly known as the Document Type Definition or DTD. Some of the elements of the above DTD are “header”, “body”, “title” etc, which define the structure of an “article”. We could simply extend this article DTD by adding another element called (say) “footer” to the DTD, and our “article” can now magically carry “footers” as well! That is the power of DTDs.
We shall learn more about XML DTDs later. However for now, having got a flavor of the mother or all markups- SGML, let us move on the next exciting thing that happened in the 1990s - the World Wide Web and HTML.
HTML: The Web Revolution
The World Wide Web (www!) was beginning to happen in the early 90’s. It revolutionized the way information was disseminated around the globe. It broke down barriers between nations and people and PCs suddenly became the next most watched thing after the Television!
But there was a drawback. Not all “browsers” showed the very same content even in the same or even similar way. This was because the presentation mechanisms on the internet had not been standardized.
People looked to SGML to provide a solution. And it’s most popular application yet came into existence in 1991 along with Tim Berners Lee’s first internet browser – HTML.
HTML (acronym for HyperText Markup Language) offers a way to represent pages so that web browsers can render them in as consistent a manner as possible. Let us take a look at our earlier “article” document in HTML now.

If you notice closely, the emphasis here is on the presentation of the content rather that the meaning. Gone are the tags for , etc which described what the document means. Instead, there are tags for paragraphs <p>, boldface <b>, italics <i> and headings <h1> which tells you how the document looks.
Since we mentioned the word “looks” let us now see how the documents renders on the Internet Explorer web browser.

Looks nice doesn’t it?! It certainly does!
But the lack of meaning implies we cannot do many “intelligent” things with this. For instance:
- We cannot use it for any further automated data processing such as forming a list of titles and authors for a table of contents.
- We cannot have it displayed on our cell phone – only on our PC web browser.
This emphasis on presentation however, was sufficient in the early stages of the World Wide Web, since presentation was all that people were so caught up with at the time. However as business-to-business and business-to-client transactions over multiple devices started to grow in mid 90’s, the focus once again shifted from presentation to the representation of content.
Once again SGML was looked at for a basis, since it was the established root standard in markup. SGML was simplified and refined further and gave rise to the next generation of markup language standards – XML.
XML: Portable Data
In 1996 the first public draft of the eXtensible Markup Language (XML) was released by the World Wide Web Consortium (W3C).
Since we are going to look at XML over many sessions to come, I will not bore you with any detail here. Instead, we will look at our “article” one last time – now as XML and briefly outline how it is different from the previous versions.

Notice a few interesting things.
- The presentation tags of HTML disappeared. However as will discover later, we can transform this data to apply presentation information in a uniform way using XSLTs.
- Differing from SGML, the tags such as and must now have closing tags as well and respectively
- The DTD is not there as it is now optional.
XML facilitated the easy transformation of data from one usable form to
another in a standards based manner. It made it possible for information to
flow across businesses and from businesses to people, no matter how they
accessed this information. The wide range of media, channels and end points
were no longer a hindrance to dissemination of knowledge. Anyway, that is all
for XML at present. We have a long journey ahead and hopefully will get a very
good hang of what all XML means by the time we are through.
To the frontlines now!
In this chapter we were understood the beginnings of XML. Hopefully you feel oriented in time at least! Now let us move forward and understand what we need to know about XML in essence. That will be our base camp.
From then on we start the ascent along one of the trails… and I will be your guide!
Copyright© 2004-2006 Aleksey Nudelman