fozziethebeat
fozziethebeat

Reputation: 1180

Parsing XML without quotes in Scala

I'm trying to parse some poorly generated xml code with scala that looks like this:

<contextfile concordance=brown>
<context filename=br-a01 paras=yes>
<p pnum=1>
<s snum=1> 
<wf cmd=ignore pos=DT>The</wf>
</s>
</p>
...

It's well structured, but as you can see there are no quotes surrounding any of the attribube values. Simplying opening the file with the below Scala snippet throws a not so surprizing error:

val semCor = XML.loadFile(args(0)) 

throws

org.xml.sax.SAXParseException: Open quote is expected for attribute "{1}" associated with an  element type  "concordance".

I'd like to know how, if it at all possible, to setup the scala xml parser to correctly parse this input as if there were quotes surrounding the attribute values.

Thanks for any suggestions!

Upvotes: 2

Views: 881

Answers (3)

Matthew Farwell
Matthew Farwell

Reputation: 61705

It is not possible to configure the parser to that extent in Scala. However, since your XML is malformed, you could use an HTML tidy library such as JSoup or TagSoup to tidy your XML first and then parse it with Scala XML. Or just get the data you want from the XMl using JSoup directly.

Upvotes: 7

Michael Kay
Michael Kay

Reputation: 163322

Why do you refer to this as XML? It isn't. You might as well refer to a Scala program as a C# program. No XML parser will make any sense of it at all. You are using a completely proprietary format for your data interchange, and you have two choices: switch to using XML instead, or write a completely proprietary parser for it.

Upvotes: 2

Vincent Biragnet
Vincent Biragnet

Reputation: 2998

It's not possible to configure the parser. Your parser won't accept "not well formed" XML. Maybe you should consider a first pass to add the quotes. In a general case, it's not possible to know how to deal with this problem, but it can be very easy in specific case, for example if attribute values dont contain any whitespace, quote, "&" or "<" characters.

Upvotes: 0

Related Questions