Reputation: 369
I want to validate if a XML (in a String object) is well formed. Like this:
"<root> Hello StackOverflow! <a> Something here </a> Goodbye StackOverflow </root>"
It should also validate attributes, but I'm kind of too far of that right now. I just want to make sure I have the logic right. Here's what I've got so far, but I'm stucked and I need some help.
public boolean isWellFormed( String str )
{
boolean retorno = true;
if ( str == null )
{
throw new NullPointerException();
}
else
{
this.chopTheElements( str );
this.chopTags();
}
return retorno;
}
private void chopTags()
{
for ( String element : this.elements )
{
this.tags.add( element.substring( 1, element.length()-1 ) );
}
}
public void chopTheElements( String str )
{
for ( int i = 0; i < str.length(); i++ )
{
if ( str.charAt( i ) == '<' )
{
elements.add( getNextToken( str.substring( i ) ) );
}
}
}
private String getNextToken( String str )
{
String retStr = "";
if ( str.indexOf( ">" ) != -1 )
{
retStr = str.substring( 0, str.indexOf( ">" ) + 1 );
}
return retStr;
}
So far I chopped the elements like "" in a list, and then the tags in another, like this: root, /root.
But I don't know how to proceed or if I'm going in the right direction. I been asigned to solve this without regex.
Any advice? I'm lost here. Thanks.
Upvotes: 0
Views: 839
Reputation: 163360
Starting by breaking the string when you see a "<" is not the way to go about it, because the chunks you identify will be unrelated to the hierarchic structure of the XML. For example, if you have as input:
<a>xxx<b>...</b>yyy</a>
then one of your chunks will be "/b>yyy<" which isn't a useful thing to break up further.
You need to structure your code according to the structure of the grammar. If the grammar says that an element consists of a start tag then a sequence of (elements or characters) then an end tag, then you need a method that matches that sequence, and calls other methods to process its components. Because the grammar is recursive, your code will be recursive, so this is known as recursive descent parsing. It's something that is often taught in computer science courses so you'll find excellent coverage of the topic in textbooks.
Upvotes: 1
Reputation: 6783
If you're not dealing with a huge XML file, consider DOM parsers for your purpose. I would suggest that you look at DocumentBuilder class for this purpose. You would actually need to call the different parse()
methods (your source can be a file or any other InputSource)
Upvotes: 0