Steve McAffer
Steve McAffer

Reputation: 385

Parse a log file with scala

I am trying to parse a text file. My input file looks like this:

ID:   12343-7888
Name:  Mary, Bob, Jason, Jeff, Suzy
           Harry, Steve
           Larry, George
City:   New York, Portland, Dallas, Kansas City
        Tampa, Bend   

Expected output would:

“12343-7888”
“Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George”
“New York, Portland, Dallas, Kansas City, Tampa, Bend"   

Note the “Name” and "City" fields have new lines or returns in them. I have this code below, but it is not working. The second line of code places each character in a line. Plus, I am having troubles only grabbing the data from the field, like only returning the actual names, where the “Name: “ is not part of the results. Also, looking to put quotes around each return field.

Can you help fix up my problems?

val lines = Source.fromFile("/filesdata/logfile.text").getLines().toList
val record = lines.dropWhile(line => !line.startsWith("Name: ")).takeWhile(line  => !line.startsWith("Address: ")).flatMap(_.split(",")).map(_.trim()).filter(_.nonEmpty).mkString(", ")
val final results record.map(s => "\"" + s + "\"").mkString(",\n")

How can I get my results that I am looking for?

Upvotes: 1

Views: 1186

Answers (1)

Andrey Tyukin
Andrey Tyukin

Reputation: 44957

SHORT ANSWER

A two-liner that produces a string that looks exactly as you specified:

println(lines.map{line => if(line.trim.matches("[a-zA-Z]+:.*")) 
  ("\"\n\"" + line.split(":")(1).trim) else (", " + line.trim)}.mkString.drop(2) + "\"")

LONG ANSWER

Why try to solve something in one line, if you can achieve the same thing in 94?

(That's the exact opposite of the usual slogan when working with Scala collections, but the input was sufficiently messy that I found it worthwhile to actually write out some of the intermediate steps. Maybe that's just because I've bought a nice new keyboard recently...)

val input = """ID:   12343-7888
Name:  Mary, Bob, Jason, Jeff, Suzy
           Harry, Steve
           Larry, George
City:   New York, Portland, Dallas, Kansas City
        Tampa, Bend
ID: 567865-676
Name: Alex, Bob 
  Chris, Dave 
     Evan, Frank
   Gary
City: Los Angeles, St. Petersburg
   Washington D.C., Phoenix
"""

case class Entry(id: String, names: List[String], cities: List[String])

def parseMessyInput(input: String): List[Entry] = {

  // just a first rought approximation of the structure of the input
  sealed trait MessyInputLine { def content: String }
  case class IdLine(content: String) extends MessyInputLine
  case class NameLine(content: String) extends MessyInputLine
  case class UnlabeledLine(content: String) extends MessyInputLine
  case class CityLine(content: String) extends MessyInputLine

  val lines = input.split("\n").toList

  // a helper function for checking whether a line starts with a label
  def tryParseLabeledLine
    (label: String, line: String)
    (cons: String => MessyInputLine)
  : Option[MessyInputLine] = {
    if (line.startsWith(label + ":")) {
      Some(cons(line.drop(label.size + 1)))
    } else {
      None
    }
  }

  val messyLines: List[MessyInputLine] = for (line <- lines) yield {
    (
      tryParseLabeledLine("Name", line){NameLine(_)} orElse
      tryParseLabeledLine("City", line){CityLine(_)} orElse
      tryParseLabeledLine("ID", line){IdLine(_)}
    ).getOrElse(UnlabeledLine(line))
  }

  /** Combines the content of the first line with the content
    * of all unlabeled lines, until the next labeled line or
    * the end of the list is hit. Returns the content of 
    * the first few lines and the list of the remaining lines.
    */
  def readUntilNextLabel(messyLines: List[MessyInputLine])
  : (List[String], List[MessyInputLine]) = {
    messyLines match {
      case Nil => (Nil, Nil)
      case h :: t => {
        val (unlabeled, rest) = t.span {
          case UnlabeledLine(_) => true
          case _ => false
        }
        (h.content :: unlabeled.map(_.content), rest)
      }
    }
  }

  /** Glues multiple lines to entries */
  def combineToEntries(messyLines: List[MessyInputLine]): List[Entry] = {
    if (messyLines.isEmpty) Nil
    else {
      val (idContent, namesCitiesRest) = readUntilNextLabel(messyLines)
      val (namesContent, citiesRest) = readUntilNextLabel(namesCitiesRest)
      val (citiesContent, rest) = readUntilNextLabel(citiesRest)
      val id = idContent.head.trim
      val names = namesContent.map(_.split(",").map(_.trim).toList).flatten
      val cities = citiesContent.map(_.split(",").map(_.trim).toList).flatten
      Entry(id, names, cities) :: combineToEntries(rest)
    }
  }

  // invoke recursive function on the entire input
  combineToEntries(messyLines)
}

// how to use
val entries = parseMessyInput(input)

// output
for (Entry(id, names, cities) <- entries) {
  println(id)
  println(names.mkString(", "))
  println(cities.mkString(", "))
}

Output:

12343-7888
Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George
New York, Portland, Dallas, Kansas City, Tampa, Bend
567865-676
Alex, Bob, Chris, Dave, Evan, Frank, Gary
Los Angeles, St. Petersburg, Washington D.C., Phoenix

You probably could write it down in one line, sooner or later. But if you write dumb code consisting of many simple intermediate steps, you don't have to think that hard, and there are no obstacles large enough to get stuck.

Upvotes: 2

Related Questions