MLeiria
MLeiria

Reputation: 623

Parse a log file and convert to case classes in Scala

I'm note sure how to approach the following problem. Let's say I have a log file like this:

asdasdçkpoiwqe
askdjadlskjqw
<stuff>
    <a>some val</a>
    <b>some val</b>
</stuff>
kasdjllasdj
clkj
skdjalkd
<moreStuff>
    <c>some val</c>
    <d>some val</d>
</moreStuff>
iuoudnas
salkdj
sdmlaks
<moreStuff>
    <c>more val</c>
    <d>some val</d>
 </moreStuff>
...

that is I have some crap text and in the middle, some xml structures well formed. I want to parse this file and convert this xml to case classes, so I defined:

case class Stuff(a: String, b: String)

case class MoreStuff(c: String, d: String)

and this code:

val filename = "logFile.log"
for (line <- Source.fromFile(filename).getLines) {
    line match {
              case "<stuff>" => parseStuff(line)
              case "<moreStuff>" => parseMoreStuff(line)
              case _ => println("Not Defined"+ line)
            }
}

def parseStuff(line: String) = {
  //Create a List[Stuff] 
}

def parseMoreStuff(line: String) = {
  //Create a List[Stuff]
}

but clearly this doesn't work because when the cycle for matches, the only line passing to the methods are <stuff> or <moreStuff>

than I thought I could pass the the iterator to the methods and inside make next. Something like this:

def parseMoreStuff(line: String, it: Iterator) = {
  var l = line
  while(!line.equals("</moreStuff>")){
    l += line
    it.next()
}

and now I have a single String l only with xml content and I can treat as xml. I runned this code and I got a java.util.NoSuchElementException: next on empty iterator but anyway I think this approach is a big mess (even if I could solve this exception). I don't like it so my question is if there's a cleaner way to parse a log file with this caracteristics.

thanks in advance

Upvotes: 0

Views: 1137

Answers (1)

Jeffrey Chung
Jeffrey Chung

Reputation: 19527

One approach is to first ignore the junk text:

val xmlAsString =
  Source.fromFile(filename)
        .getLines
        .map(_.trim)
        .filter(_.startsWith("<"))
        .mkString

// <stuff><a>some val</a><b>some val</b></stuff><moreStuff><c>some val</c><d>some val</d></moreStuff><moreStuff><c>more val</c><d>some val</d></moreStuff>

Note that in the above code I convert the Iterator to a String, so this could be an issue if the XML content in your file is too large to fit in memory.

Next, using Scala's standard XML library (which, as of Scala 2.11, has been moved to its own library), aggregate the XML fragments into one XML document (to make this composite document well-formed, add a root element):

import scala.xml._

val xmlDoc = XML.loadString("<stuffRoot>" + xmlAsString + "</stuffRoot>")

Then, to obtain a Seq of Stuffs and a Seq of MoreStuffs:

def parseStuff(node: Node): Stuff = {
  Stuff((node \ "a").toString, (node \ "b").toString)
}

def parseMoreStuff(node: Node): MoreStuff = {
  MoreStuff((node \ "c").toString, (node \ "d").toString)
}

val stuffs = (xmlDoc \ "stuff").map(parseStuff) // Seq[Stuff]
val moreStuffs = (xmlDoc \ "moreStuff").map(parseMoreStuff) // Seq[MoreStuff]

Upvotes: 1

Related Questions