andyczerwonka
andyczerwonka

Reputation: 4260

parsing a text file into groups using Scala

I have a CSV file that is really a set of many CSV files in one. Something like this:

"First Part"
"Some", "data", "in", "here"
"More", "stuff", "over", "here"

"Another Part"
"This", "section", "is", "not", "the", "same", "as", "the", "first"
"blah", "blah", "blah", "blah", "blah", "blah", "blah", "blah", "blah"

"Yet another section"
"And", "this", "is", "yet", "another"
"blah", "blah", "blah", "blah", "blah"

I'd like to break it into separate components. Given I know the header for each section, it'd be nice if I could do some kind of groupBy or something where I pass in a set of regexp's representing header patterns and return a Seq[Seq[String]] or something similar.

Upvotes: 1

Views: 1237

Answers (2)

Emil L
Emil L

Reputation: 21081

You could do the following:

val groups = List("\"First Part\"", "\"Another Part\"", "\"Yet another section\"")
val accumulator = List[List[String]]()
val result = input.split("\n").foldLeft(accumulator)((acc,e) => {
  if (groups.contains(e)) {
    // Make new group when we encounter a string matching one of the groups
    Nil :: acc
  } else {
    // Grab current group and modify it
    val newHead = e :: acc.head 
    newHead :: acc.tail 
  }
})

Each list in result now represent a group. If you want to use regex to find your matches then just replace the groups.contains(e) with a match test. There are some subtleties here that might deserve a mention:

  • The algorithm will fail if the input does not start with a heading
  • If a heading is present several times each time it is present will generate a new group
  • Groups will contain the lines in the input in reverse.
  • Empty lines will also be included in the result.

Upvotes: 1

Ivan Meredith
Ivan Meredith

Reputation: 2222

EDIT this is similar to the other solution that was posted at the same time. A similar thing for the sections headings could be done instead of my quick hack of size==1. This solution has the added benefit of including the secion name so ordering doesn't matter.

val file: List[String] = """

heading
1,2,3
4,5

heading2
5,6
""".split("\n").toList
val splitFile = file
  .map(_.split(",").toList)
  .filterNot(_ == List(""))
  .foldLeft(List[(String, List[List[String]])]()){
    case (h::t,l) => {if(l.size==1) (l(0),List()):: h :: t else (h._1, l :: h._2) :: t}; 
    case (Nil, l)=> if(l.size==1) List((l(0),List())) else List() }
  .reverse

produces

 splitFile: List[(String, List[List[String]])] = List((heading,List(List(4, 5), List(1, 2, 3))), (heading2,List(List(5, 6))))

Upvotes: 0

Related Questions