Reputation: 1180
I'm trying to parse some poorly generated xml code with scala that looks like this:
<contextfile concordance=brown>
<context filename=br-a01 paras=yes>
<p pnum=1>
<s snum=1>
<wf cmd=ignore pos=DT>The</wf>
</s>
</p>
...
It's well structured, but as you can see there are no quotes surrounding any of the attribube values. Simplying opening the file with the below Scala snippet throws a not so surprizing error:
val semCor = XML.loadFile(args(0))
throws
org.xml.sax.SAXParseException: Open quote is expected for attribute "{1}" associated with an element type "concordance".
I'd like to know how, if it at all possible, to setup the scala xml parser to correctly parse this input as if there were quotes surrounding the attribute values.
Thanks for any suggestions!
Upvotes: 2
Views: 881
Reputation: 61705
It is not possible to configure the parser to that extent in Scala. However, since your XML is malformed, you could use an HTML tidy library such as JSoup or TagSoup to tidy your XML first and then parse it with Scala XML. Or just get the data you want from the XMl using JSoup directly.
Upvotes: 7
Reputation: 163322
Why do you refer to this as XML? It isn't. You might as well refer to a Scala program as a C# program. No XML parser will make any sense of it at all. You are using a completely proprietary format for your data interchange, and you have two choices: switch to using XML instead, or write a completely proprietary parser for it.
Upvotes: 2
Reputation: 2998
It's not possible to configure the parser. Your parser won't accept "not well formed" XML. Maybe you should consider a first pass to add the quotes. In a general case, it's not possible to know how to deal with this problem, but it can be very easy in specific case, for example if attribute values dont contain any whitespace, quote, "&" or "<" characters.
Upvotes: 0