codeObserver
codeObserver

Reputation: 6647

Algorithm for Parsing an XML

I was reading one of the interview question about parsing an XML.

I wrote a very high level and incomplete skeleton of the algorithm , and was looking for some help in writing a simple algo [I assume one exists, because this was asked as an interview question so should be doable in 45 mins I guess].

Here is my attempt:

   // Assume well-formedness
    public static Node parseXML(String xml)
    {

        Node root = new XMLParse().new Node();

        while(!helper.endOfElements())
        {
            // Returns name of root element
            root.name = helper.getName(xml);
            // Returns value of root element
            root.name = helper.getValue(xml);

            // returns each child in a String and returns all such children as 
            // a String Array
            // Basically splits based on <> </> elements and return that as a String
            String[] children = helper.getChildren(xml);

            if(children.length!=0)
            {
                root.childList = new ArrayList<XMLParse.Node>();
                for(int i=0; i<children.length;i++)
                {
                    root.childList.add(parseXML(children[i]));
                }
            }

        }


        return root;
    }


    class Node
    {
        String name;
        String value;
        List<Node> childList;

        public String getName()
        {
            return name;
        }

        public String getValue()
        {
            return value;
        }

        public List<Node> getChildList()
        {
            return childList;
        }

    }


Class helper()
{

// Returns the name of the root of the xml
public static String getName(String XML);

// Returns the value of the root of the xml
public static String getValue(String XML)

// Splits the XML into top level childern of the root of the passed XML
public static String[] getChildren(String XML)

}

I am hoping someone can give me a pseudo-code/code for doing this or may be provide an easy way of implementing the helper function in my algo.

I know there are built in classes to do this like in here , but using them would beat the purpose I guess. Also, many things used in this link are just interfaces so I couldnt find any implementation of say docBuilder.parse (new File("book.xml")) method.

Upvotes: 2

Views: 11093

Answers (1)

ccozad
ccozad

Reputation: 1129

About the actual question

This is an example of an interview question that is designed to expose the inexperience of the applicant. An applicant that hasn't been around the industry much (or who just put XML on their resume without ever having implemented a project) might eagerly go off and start churning code.

An applicant that has successfully navigated a few interviews or completed a few projects that use XML will step back and make a few observations

  • Writing and testing a standards compliant parser is not going to be possible in an interview setting.
    • This would take weeks, months or longer. If a question is asked but the answer takes too long, the interviewer is really asking a different question
    • With enough experience you can see these "trick questions" and respond accordingly (Ask for more information about what the interviewer is looking for)
  • The question asks for two ways to parse XML.
    • Is this a coincidence? No. There are two popular techniques of processing XML.
    • The question could have been "Describe two popular ways to parse XML and discuss their pros and cons" Yes that would have been clearer, but the interviewer is also testing how you respond to situations where the requirements are unclear.

About the solution

In an interview setting I would answer something along the lines of:

I don't think it will be productive to write an actual parser here during the interview. This is because standards compliant parsers are time consuming to write and validate and there are already plenty of implementations for different languages. However, I will describe how to use two popular parsers and how I have used them on past projects. I'll also discuss the pros and cons of each approach.

The first approach that can be used when parsing XML is to treat the entire document as a tree of nodes. This type of parser is called a DOM parser. I used a DOM parser on {insert relevant project experience}. We used a DOM parser because we needed to access different parts of the document at the same time.

The second approach that can be used when parsing XML is to treat the document as a stream of events or nodes. This type of approach is called a SAX parser, I used a SAX parser on {insert relevant project experience}. We used a SAX parser because we couldn't fit the entire document in memory.

{insert discussion on pros and cons}

Further reading

http://www.cs.nmsu.edu/~epontell/courses/XML/material/xmlparsers.html

Upvotes: 21

Related Questions