Tuesday, July 19, 2011

XML SAMPLE CODE

Extensible Markup Language (XML)
The markup language most widely used today is undoubtedly Hyper Text Markup Language (HTML), which is used to create Webpages. A Markup language describes thestructure of the document. HTML is based on Standard Generalized Markup Language (SGML), which is an application of SGML. Webpages designed using HTML are designed using HTML predefined tags. These days, as Internet is used widely as general form of communication and as transferring data over the Internet is becoming more intensive and handling that data more complex many Web Developers are turning to XML as their alternative to HTML. It's worth having a brief overview of this wonderful new Markup Language which is changing the way data is handled on the Internet.
What is XML?
XML is a meta-markup language which means that it lets us create our own markup language (our own tags).
XML is popular for the following reasons:

  • It Allows Easy Data Exchange
  • It Allows to Customize Markup languages
  • Makes the data in the document Self-Describing
  • Allows for Structured and Integrated data
The current version of XML is 1.0 and XML is case sensitive. Let's follow this meta-markup language with an example. Save the following code with a .xml extension.

    <?xml version="1.0" encoding="UTF-8"?>
    <DOCUMENT>
    <WELCOME>
    Welcome to XML
    </WELCOME>
    </DOCUMENT>>

    Breaking the above code for understanding:
    The document starts with the XML processing instruction <?xml version="1.0" encoding="UTF-8"?>All XML processing instructions should start and end with ?xml version="1.0" means the version of XML, which is currently 1.0
     UTF-8 is a 8-bit condensed version of Unicode
    The document starts with the <DOCUMENT> element which may or may not contain other elements within it and should always end with </DOCUMENT>. All other elements should be between <DOCUMENT> and </DOCUMENT> making <DOCUMENT> the root element for this XML page.
     The next element is <WELCOME> between the <DOCUMENT> and </DOCUMENT> and which contains a message, Welcome to XML.
    The above code when opened in a browser looks like the image below.


    To format the content of the elements created in the document we use a style sheet to tell the browser the way the document should be. Alternatively, programming languages like Java and JavaScript can be used. Lets take a look how the above example looks when formatted using style sheet.

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/css" href="style.css"?>
    <DOCUMENT>
    <WELCOME>
    Welcome to XML
    </WELCOME>
    </DOCUMENT>

    The above code includes a new line <?xml-stylesheet type="text/css" href="style.css"?> which means that the type of style sheet being used is CSS (Cascading Style Sheet, XSL can also be used) and it's name is style.css.
    The file style.css looks like this:  WELCOME{font-size:40pt;font-family:Arial; color:red}
    This file states that it's customizing the <WELCOME> element to display it's content in a 40 pt font with arial as it's font and it's color as red.
    You can customize different elements to display their content in different fonts and colors.
    Make sure that the file style.css is saved in the same directory where the xml file is saved. The output after adding the style sheet looks like the image below.


    XML is case sensitive, which means <WeLCOME> and </Welcome> are treated differently. <WELCOME> should be closed with a corresponding</WELCOME> tag.
    Well-Formed XML Documents
    If an XML document is not understood successfully by an XML processor then the processor cannot format the document. To handle that, XML documents are subject to two constraints: well formedness and validity, well formedness being the basic constraint.
    Well-Formed Document
    As set by the W3C, for an XML document to be well formed it should follow the document production containing three parts in the document.
    • prolog
    • root element
    • Optional miscellaneous part

    The prolog should include an XML declaration such as <?xml version="1.0"?>.  It can also contain a Document Type Definition (DTD).
    The root element of a document can hold other elements and the document should contain exactly one root element. All other elements should be enclosed within the root element.
    The optional miscellaneous part can be made up of XML comments, processing instructions and whitespaces.
    Also the XML document should follow the syntax rules specified in the XML 1.0 recommendation set by W3C.
    An example of a well formed document is listed below :

    <?xml version="1.0" encoding="UTF-8"?>
    <DOCUMENT>
    <CONSUMER>
    <NAME>
    <FIRST_NAME>
    BEN
    </FIRST_NAME>
    <LAST_NAME>
    HOLLIAKE
    </LAST_NAME>
    </NAME>
    <PURCHASE>
    <ORDER>
    <ITEM>
    DVD
    </ITEM>
    <QUANTITY>
    1
    </QUANTITY>
    <PRICE>
    200
    </PRICE>
    </ORDER>
    </PURCHASE>
    </CONSUMER>
    <CONSUMER>
    <NAME>
    <FIRST_NAME>
    ADAM
    </FIRST_NAME>
    <LAST_NAME>
    ANDERSON
    </LAST_NAME>
    </NAME>
    <PURCHASE>
    <ORDER>
    <ITEM>
    VCR
    </ITEM>
    <QUANTITY>
    1
    </QUANTITY>
    <PRICE>
    150
    </PRICE>
    </ORDER>
    </PURCHASE>
    </CONSUMER>
    </DOCUMENT>

    Understanding the above document for well-formedness:
    The document starts with a prolog, which is the xml declaration.
    The First element, which is the root element is the <DOCUMENT> element which contains all other elements.
    Next is the <CONSUMER> element inside the root element which is for two consumers.
    For each consumer, their name is stored in the <NAME> element which itself contains elements like <FIRST_NAME> and <LAST_NAME>.
    The details of the purchases which the consumer made is stored in the <ORDER> element in the <PURCHASE> element which in turn contains the elements <ITEM><QUANTITY><PRICE> which records the item purchased, quantity and price which the consumer purchased.
    The document ends with the closing </DOCUMENT> element.
    Data can be stored for as many consumers as wanted and handling such kind of data is not a problem for the XML processor.
    The following are the basic rules that should be kept on mind when creating a Well-Formed XML document.

    • The document should start with an XML declaration
    • The document should be included with one or more elements
    • For elements that are not empty include start and end tags
    • All elements of the document should be contained within the root element
    • Elements should be nested correctly

    Documents like the one above can be extended as long as we can. XML doesn't have any problem handling such kind of documents, as long as they are wellformed.

    Valid XML Documents
    An XML document is said to be valid if it has a Document Type Definition (DTD) or XML schema associated with it and if the document complies with it. DTD's are all about specifying the structure of the document and not the content of the document. And with a common DTD many XML applications can be shared. Such is the importance of a DTD.
    Let's take a look at the example which was created in the section Well-Formed XML documents.

    <?xml version="1.0" encoding="UTF-8"?>
    <DOCUMENT>
    <CONSUMER>
    <NAME>
    <FIRST_NAME>
    BEN
    </FIRST_NAME>
    <LAST_NAME>
    HOLLIAKE
    </LAST_NAME>
    </NAME>
    <PURCHASE>
    <ORDER>
    <ITEM>
    DVD
    </ITEM>
    <QUANTITY>
    1
    </QUANTITY>
    <PRICE>
    200
    </PRICE>
    </ORDER>
    </PURCHASE>
    </CONSUMER>
    <CONSUMER>
    <NAME>
    <FIRST_NAME>
    ADAM
    </FIRST_NAME>
    <LAST_NAME>
    ANDERSON
    </LAST_NAME>
    </NAME>
    <PURCHASE>
    <ORDER>
    <ITEM>
    VCR
    </ITEM>
    <QUANTITY>
    1
    </QUANTITY>
    <PRICE>
    150
    </PRICE>
    </ORDER>
    </PURCHASE>
    </CONSUMER>
    </DOCUMENT>

    Adding a DTD to the example above makes the code look like this:  


    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE DOCUMENT[
    <!ELEMENT DOCUMENT (CONSUMER)*>
    <!ELEMENT CONSUMER (NAME,PURCHASE)>
    <!ELEMENT NAME (FIRST_NAME,LAST_NAME)>
    <!ELEMENT FIRST_NAME (#PCDATA)>
    <!ELEMENT LAST_NAME (#PCDATA)>
    <!ELEMENT PURCHASE (ORDER)*>
    <!ELEMENT ORDER (ITEM,QUANTITY,PRICE)>
    <!ELEMENT ITEM (#PCDATA)>
    <!ELEMENT QUANTITY (#PCDATA)>
    <!ELEMENT PRICE (#PCDATA)>
    ]> <?xml version="1.0" encoding="UTF-8"?>
    <DOCUMENT>
    <CONSUMER>
    <NAME>
    <FIRST_NAME>
    BEN
    </FIRST_NAME>
    <LAST_NAME>
    HOLLIAKE
    </LAST_NAME>
    </NAME>
    <PURCHASE>
    <ORDER>
    <ITEM>
    DVD
    </ITEM>
    <QUANTITY>
    1
    </QUANTITY>
    <PRICE>
    200
    </PRICE>
    </ORDER>
    </PURCHASE>
    </CONSUMER>
    <CONSUMER>
    <NAME>
    <FIRST_NAME>
    ADAM
    </FIRST_NAME>
    <LAST_NAME>
    ANDERSON
    </LAST_NAME>
    </NAME>
    <PURCHASE>
    <ORDER>
    <ITEM>
    VCR
    </ITEM>
    <QUANTITY>
    1
    </QUANTITY>
    <PRICE>
    150
    </PRICE>
    </ORDER>
    </PURCHASE>
    </CONSUMER>
    </DOCUMENT>

    Breaking the DTD for understanding:
    Note the first line of the DTD, <!DOCTYPE DOCUMENT[. That line is the document type declaration.<!DOCTYPE> is the syntax to declare a DTD and it should be followed by the root element, which in this example is the DOCUMENT element.
    Each element should be specified with the syntax <!ELEMENT>. Using that declaration we can specify whether each element is a parsed character data (#PCDATA, used for storing plain text) or can contain other elements in it.
    In the example above the CONSUMER element is written like this <!ELEMENT DOCUMENT(CONSUMER)*>.The asterik(*) here indicates that the CONSUMER element can have zero or more occurrences. In the example above, it has two occurrences.
    The next element in the CONSUMER element is the NAME element which in turn contains the elements FIRST_NAME and LAST_NAME within it.
    Both the FIRST_NAME and LAST_NAME elements are declared as #PCDATA which allows them to handle plain text.
    The next element in the DTD is the PURCHASE element with an asterik(*) which means that it has zero or more occurrences.
    The elements within the PURCHASE element is the ORDER element which in turn include the elements ITEM, QUANTITY and PRICE.
    The elements ITEM, QUANTITY and PRICE are declared as #PCDATA as they hold only plain text.
    That's how a basic DTD looks like. A DTD like the one above is said to be an internal DTD. We can also create external DTD's and it's these external DTD's which allows us to share a common XML document within different organizations.
    For more information about how to insert attributes, comments, etc in DTD's please refer to the W3C specification for XML DTD's. The image below shows how the above code when opened in an browser looks like.



    No comments:

    Post a Comment