Reputation: 623
I'm note sure how to approach the following problem. Let's say I have a log file like this:
asdasdçkpoiwqe
askdjadlskjqw
<stuff>
<a>some val</a>
<b>some val</b>
</stuff>
kasdjllasdj
clkj
skdjalkd
<moreStuff>
<c>some val</c>
<d>some val</d>
</moreStuff>
iuoudnas
salkdj
sdmlaks
<moreStuff>
<c>more val</c>
<d>some val</d>
</moreStuff>
...
that is I have some crap text and in the middle, some xml structures well formed. I want to parse this file and convert this xml to case classes, so I defined:
case class Stuff(a: String, b: String)
case class MoreStuff(c: String, d: String)
and this code:
val filename = "logFile.log"
for (line <- Source.fromFile(filename).getLines) {
line match {
case "<stuff>" => parseStuff(line)
case "<moreStuff>" => parseMoreStuff(line)
case _ => println("Not Defined"+ line)
}
}
def parseStuff(line: String) = {
//Create a List[Stuff]
}
def parseMoreStuff(line: String) = {
//Create a List[Stuff]
}
but clearly this doesn't work because when the cycle for matches, the only line passing to the methods are <stuff>
or <moreStuff>
than I thought I could pass the the iterator to the methods and inside make next
. Something like this:
def parseMoreStuff(line: String, it: Iterator) = {
var l = line
while(!line.equals("</moreStuff>")){
l += line
it.next()
}
and now I have a single String l
only with xml content and I can treat as xml. I runned this code and I got a java.util.NoSuchElementException: next on empty iterator
but anyway I think this approach is a big mess (even if I could solve this exception). I don't like it so my question is if there's a cleaner way to parse a log file with this caracteristics.
thanks in advance
Upvotes: 0
Views: 1137
Reputation: 19527
One approach is to first ignore the junk text:
val xmlAsString =
Source.fromFile(filename)
.getLines
.map(_.trim)
.filter(_.startsWith("<"))
.mkString
// <stuff><a>some val</a><b>some val</b></stuff><moreStuff><c>some val</c><d>some val</d></moreStuff><moreStuff><c>more val</c><d>some val</d></moreStuff>
Note that in the above code I convert the Iterator
to a String
, so this could be an issue if the XML content in your file is too large to fit in memory.
Next, using Scala's standard XML library (which, as of Scala 2.11, has been moved to its own library), aggregate the XML fragments into one XML document (to make this composite document well-formed, add a root element):
import scala.xml._
val xmlDoc = XML.loadString("<stuffRoot>" + xmlAsString + "</stuffRoot>")
Then, to obtain a Seq
of Stuff
s and a Seq
of MoreStuff
s:
def parseStuff(node: Node): Stuff = {
Stuff((node \ "a").toString, (node \ "b").toString)
}
def parseMoreStuff(node: Node): MoreStuff = {
MoreStuff((node \ "c").toString, (node \ "d").toString)
}
val stuffs = (xmlDoc \ "stuff").map(parseStuff) // Seq[Stuff]
val moreStuffs = (xmlDoc \ "moreStuff").map(parseMoreStuff) // Seq[MoreStuff]
Upvotes: 1