XML interview Questions


XML interview questions mainly goes around DOM and SAX parser. It confirms that you have done the XML parsing and you know that what to use in different scenarios.  If the application is more webservice or messaging based then there will be more questions on this as It is very important in application messaging. Knowing XML parsing and how to use in messaging will help you to clear xml based  interview.


What is the difference between SAX parser and DOM parser?
DOM parser - reads the whole XML document and returns a DOM tree representation of xml document. It provides a convenient way for reading, analyzing and manipulating XML files. It is not well suited for large xml files, as it always reads the whole file before processing.
SAX parser - works incrementally and generate events that are passed to the application. It does not generate data representation of xml content so some programming is required. However, it provides stream processing and partial processing which cannot be done alone by DOM parser.

What is XSL?
XSLT - a language for transforming XML documents
XSLT is used to transform an XML document into another XML document, or another type of document that is recognized by a browser, like HTML and XHTML. Normally XSLT does this by transforming each XML element into an (X)HTML element.
XPath - a language for navigating in XML documents
XSL-FO - a language for formatting XML documents

Why Use a DTD?
A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes.
A DTD can be declared inline inside an XML document, or as an external reference.
Seen from a DTD point of view, all XML documents (and HTML documents) are made up by the following building blocks:
  • Elements
  • Attributes
  • Entities
  • PCDATA
  • CDATA
Elements
Elements are the main building blocks of both XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".
Examples:
<body>some text</body>
< message>some text</message>

Attributes
Attributes provide extra information about elements.
Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".

Entities
Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag.
Most of you know the HTML entity: "&nbsp;". This "no-breaking-space" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.
The following entities are predefined in XML:
Entity References
Character
&lt;
<
&gt;
>
&amp;
&
&quot;
"
&apos;
'

PCDATA
PCDATA means parsed character data.
Think of character data as the text found between the start tag and the end tag of an XML element.
PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup.
Tags inside the text will be treated as markup and entities will be expanded.
However, parsed character data should not contain any &, <, or > characters; these need to be represented by the &amp; &lt; and &gt; entities, respectively.

CDATA
CDATA means character data.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
With a DTD, each of your XML files can carry a description of its own format.

With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.

Your application can use a standard DTD to verify that the data you receive from the outside world is valid.
You can also use a DTD to verify your own data.
Sample DTD
<!DOCTYPE TVSCHEDULE [
< !ELEMENT TVSCHEDULE (CHANNEL+)>
< !ELEMENT CHANNEL (BANNER,DAY+)>
< !ELEMENT BANNER (#PCDATA)>
< !ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)>
< !ELEMENT HOLIDAY (#PCDATA)>
< !ELEMENT DATE (#PCDATA)>
< !ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)>
< !ELEMENT TIME (#PCDATA)>
< !ELEMENT TITLE (#PCDATA)>
< !ELEMENT DESCRIPTION (#PCDATA)>
< !ATTLIST TVSCHEDULE NAME CDATA #REQUIRED>
< !ATTLIST CHANNEL CHAN CDATA #REQUIRED>
< !ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
< !ATTLIST TITLE RATING CDATA #IMPLIED>
< !ATTLIST TITLE LANGUAGE CDATA #IMPLIED>
]>

What is an XML Schema?
The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.
An XML Schema:
•defines elements that can appear in a document
•defines attributes that can appear in a document
•defines which elements are child elements
•defines the order of child elements
•defines the number of child elements
•defines whether an element is empty or can include text
•defines data types for elements and attributes
•defines default and fixed values for elements and attributes

XML Schemas are the Successors of DTDs
We think that very soon XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons:
•XML Schemas are extensible to future additions
•XML Schemas are richer and more powerful than DTDs
•XML Schemas are written in XML
•XML Schemas support data types
•XML Schemas support namespaces

XML Schema Details:
This chapter will demonstrate how to write an XML Schema. You will also learn that a schema can be written in different ways.
An XML Document
Let's have a look at this XML document called "shiporder.xml":
< ?xml version="1.0" encoding="ISO-8859-1"?>
< shiporder orderid="889923"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="shiporder.xsd">
< orderperson>John Smith</orderperson>
< shipto>
< name>Ola Nordmann</name>
< address>Langgt 23</address>
< city>4000 Stavanger</city>
< country>Norway</country>
< /shipto>
< item>
< title>Empire Burlesque</title>
< note>Special Edition</note>
< quantity>1</quantity>
< price>10.90</price>
< /item>
< item>
< title>Hide your heart</title>
< quantity>1</quantity>
< price>9.90</price>
< /item>
< /shiporder>
The XML document above consists of a root element, "shiporder", that contains a required attribute called "orderid". The "shiporder" element contains three different child elements: "orderperson", "shipto" and "item". The "item" element appears twice, and it contains a "title", an optional "note" element, a "quantity", and a "price" element.
The line above: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" tells the XML parser that this document should be validated against a schema. The line: xsi:noNamespaceSchemaLocation="shiporder.xsd" specifies WHERE the schema resides (here it is in the same folder as "shiporder.xml").

Create an XML Schema
Now we want to create a schema for the XML document above.
We start by opening a new file that we will call "shiporder.xsd". To create the schema we could simply follow the structure in the XML document and define each element as we find it. We will start with the standard XML declaration followed by the xs:schema element that defines a schema:
< ?xml version="1.0" encoding="ISO-8859-1" ?>
< xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
...
< /xs:schema>
In the schema above we use the standard namespace (xs), and the URI associated with this namespace is the Schema language definition, which has the standard value of http://www.w3.org/2001/XMLSchema.
Next, we have to define the "shiporder" element. This element has an attribute and it contains other elements, therefore we consider it as a complex type. The child elements of the "shiporder" element is surrounded by a xs:sequence element that defines an ordered sequence of sub elements:
< xs:element name="shiporder">
< xs:complexType>
< xs:sequence>
...
< /xs:sequence>
< /xs:complexType>
< /xs:element>
Then we have to define the "orderperson" element as a simple type (because it does not contain any attributes or other elements). The type (xs:string) is prefixed with the namespace prefix associated with XML Schema that indicates a predefined schema data type:
< xs:element name="orderperson" type="xs:string"/
Next, we have to define two elements that are of the complex type: "shipto" and "item". We start by defining the "shipto" element:
< xs:element name="shipto">
< xs:complexType>
< xs:sequence>
< xs:element name="name" type="xs:string"/>
< xs:element name="address" type="xs:string"/>
< xs:element name="city" type="xs:string"/>
< xs:element name="country" type="xs:string"/>
< /xs:sequence>
< /xs:complexType>
< /xs:element>
With schemas we can define the number of possible occurrences for an element with the maxOccurs and minOccurs attributes. maxOccurs specifies the maximum number of occurrences for an element and minOccurs specifies the minimum number of occurrences for an element. The default value for both maxOccurs and minOccurs is 1!
Now we can define the "item" element. This element can appear multiple times inside a "shiporder" element. This is specified by setting the maxOccurs attribute of the "item" element to "unbounded" which means that there can be as many occurrences of the "item" element as the author wishes. Notice that the "note" element is optional. We have specified this by setting the minOccurs attribute to zero:

< xs:element name="item" maxOccurs="unbounded">
< xs:complexType>
< xs:sequence>
< xs:element name="title" type="xs:string"/>
< xs:element name="note" type="xs:string" minOccurs="0"/>
< xs:element name="quantity" type="xs:positiveInteger"/>
< xs:element name="price" type="xs:decimal"/>
< /xs:sequence>
< /xs:complexType>
< /xs:element>

We can now declare the attribute of the "shiporder" element. Since this is a required attribute we specify use="required".
Note: The attribute declarations must always come last:
< xs:attribute name="orderid" type="xs:string" use="required"/>
Here is the complete listing of the schema file called "shiporder.xsd":
< ?xml version="1.0" encoding="ISO-8859-1" ?>
< xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
< xs:element name="shiporder">
< xs:complexType>
< xs:sequence>
< xs:element name="orderperson" type="xs:string"/>
< xs:element name="shipto">
< xs:complexType>
< xs:sequence>
   < xs:element name="name" type="xs:string"/>
   < xs:element name="address" type="xs:string"/>
   < xs:element name="city" type="xs:string"/>
   < xs:element name="country" type="xs:string"/>
< /xs:sequence>
< /xs:complexType>
< /xs:element>
< xs:element name="item" maxOccurs="unbounded">
< xs:complexType>
< xs:sequence>
  < xs:element name="title" type="xs:string"/>
  < xs:element name="note" type="xs:string" minOccurs="0"/>
  < xs:element name="quantity" type="xs:positiveInteger"/>
  < xs:element name="price" type="xs:decimal"/>
< /xs:sequence>
< /xs:complexType>
< /xs:element>
< /xs:sequence>
  < xs:attribute name="orderid" type="xs:string" use="required"/>
< /xs:complexType>
< /xs:element>
< /xs:schema>

Divide the Schema
The previous design method is very simple, but can be difficult to read and maintain when documents are complex.
The next design method is based on defining all elements and attributes first, and then referring to them using the ref attribute.
Here is the new design of the schema file ("shiporder.xsd"):
< ?xml version="1.0" encoding="ISO-8859-1" ?>
< xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
< !-- definition of simple elements -->
< xs:element name="orderperson" type="xs:string"/>
< xs:element name="name" type="xs:string"/>
< xs:element name="address" type="xs:string"/>
< xs:element name="city" type="xs:string"/>
< xs:element name="country" type="xs:string"/>
< xs:element name="title" type="xs:string"/>
< xs:element name="note" type="xs:string"/>
< xs:element name="quantity" type="xs:positiveInteger"/>
< xs:element name="price" type="xs:decimal"/>
< !-- definition of attributes -->
< xs:attribute name="orderid" type="xs:string"/>
< !-- definition of complex elements -->
< xs:element name="shipto">
< xs:complexType>
< xs:sequence>
    < xs:element ref="name"/>
    < xs:element ref="address"/>
    < xs:element ref="city"/>
    < xs:element ref="country"/>
< /xs:sequence>
< /xs:complexType>
< /xs:element>
< xs:element name="item">
< xs:complexType>
< xs:sequence>
    < xs:element ref="title"/>
    < xs:element ref="note" minOccurs="0"/>
    < xs:element ref="quantity"/>
    < xs:element ref="price"/>
< /xs:sequence>
< /xs:complexType>
< /xs:element>
< xs:element name="shiporder">
< xs:complexType>
< xs:sequence>
< xs:element ref="orderperson"/>
< xs:element ref="shipto"/>
< xs:element ref="item" maxOccurs="unbounded"/>
< /xs:sequence>
    < xs:attribute ref="orderid" use="required"/>
< /xs:complexType>
< /xs:element>
< /xs:schema>

Using Named Types
The third design method defines classes or types, that enables us to reuse element definitions. This is done by naming the simpleTypes and complexTypes elements, and then point to them through the type attribute of the element.
Here is the third design of the schema file ("shiporder.xsd"):
< ?xml version="1.0" encoding="ISO-8859-1" ?>
< xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
< xs:simpleType name="stringtype">
< xs:restriction base="xs:string"/>
< /xs:simpleType>
< xs:simpleType name="inttype">
< xs:restriction base="xs:positiveInteger"/>
< /xs:simpleType>
< xs:simpleType name="dectype">
< xs:restriction base="xs:decimal"/>
< /xs:simpleType>
< xs:simpleType name="orderidtype">
< xs:restriction base="xs:string">
< xs:pattern value="[0-9]{6}"/>
< /xs:restriction>
< /xs:simpleType>
< xs:complexType name="shiptotype">
< xs:sequence>
    < xs:element name="name" type="stringtype"/>
    < xs:element name="address" type="stringtype"/>
    < xs:element name="city" type="stringtype"/>
    < xs:element name="country" type="stringtype"/>
< /xs:sequence>
< /xs:complexType>
< xs:complexType name="itemtype">
< xs:sequence>
    < xs:element name="title" type="stringtype"/>
    < xs:element name="note" type="stringtype" minOccurs="0"/>
    < xs:element name="quantity" type="inttype"/>
    < xs:element name="price" type="dectype"/>
< /xs:sequence>
< /xs:complexType>
< xs:complexType name="shipordertype">
< xs:sequence>
    < xs:element name="orderperson" type="stringtype"/>
    < xs:element name="shipto" type="shiptotype"/>
    < xs:element name="item" maxOccurs="unbounded" type="itemtype"/>
< /xs:sequence>
    < xs:attribute name="orderid" type="orderidtype" use="required"/>
< /xs:complexType>
< xs:element name="shiporder" type="shipordertype"/>
< /xs:schema>

The restriction element indicates that the datatype is derived from a W3C XML Schema namespace datatype. So, the following fragment means that the value of the element or attribute must be a string value:
  < xs:restriction base="xs:string">
The restriction element is more often used to apply restrictions to elements. Look at the following lines from the schema above:
< xs:simpleType name="orderidtype">
< xs:restriction base="xs:string">
< xs:pattern value="[0-9]{6}"/>
< /xs:restriction>
< /xs:simpleType>

This indicates that the value of the element or attribute must be a string, it must be exactly six characters in a row, and those characters must be a number from 0 to 9.

Difference in DTD and Schemas?
A DTD is:
 The XML Document Type Declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition or DTD.
  The DTD can point to an external subset containing markup declarations, or can contain the markup declarations directly in an internal subset, or can even do both.

A Schema is:
 XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents.

In summary, schemas are a richer and more powerful of describing information than what is possible with DTDs.

 Code Example of SAX XML Parsing ?
XML Parsing with SAX builder
// Get the instance of the SAX parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// Get SAX parser Instance
SAXParser saxParser = factory.newSAXParser();
// Define own handler
      DefaultHandler handler = new DefaultHandler() {
        boolean name = false;
        public void startElement(String uri, String localName,
            String qName, Attributes attributes)
            throws SAXException {
          if (qName.equalsIgnoreCase("EMAIL")) {
            name = true;
          }
        }
        public void characters(char ch[], int start, int length)
            throws SAXException {
          if (name) {
            System.out.println("Name: "
                + new String(ch, start, length));
            name = false;
          }
        }
      };
// parse the document
   saxParser.parse(fileName, handler);

Sample usage of above Sax Factory:
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class Main {
public static void main(String[] argv) throws Exception {
     SAXParserFactory factory = SAXParserFactory.newInstance();
     SAXParser parser = factory.newSAXParser();
     SaxHandler handler = new SaxHandler();
     parser.parse("sample.xml", handler);
}
}
class SaxHandler extends DefaultHandler {
public void startElement(String uri, String localName, String qName, Attributes attrs)
throws SAXException {
     if (qName.equals("order")) {
        String date = attrs.getValue("date");
        String number = attrs.getValue("number");
        System.out.println("Order #" + number + " date is '" + date + "'");
     }
   }
}

5 comments: