Jacek Laskowski
Jacek Laskowski

Reputation: 74719

How to parse CSV file skipping lines with odd number of attributes as one-liner?

I'd like to solicit feedback (aka code review) on the following method that parses a CSV file skipping lines with odd number of attributes -- the 2nd line in the CSV file below:

e,2,3,13,k1,v1,k2,v2
e,2,2,10,k1,v1,k2  // this line should be skipped

I'm concerned that I have to use Option to skip incorrect lines. I'm thinking of using foldLeft instead. Is there a better approach?

def getEdges(seq: Seq[Seq[String]]): Seq[Edge] =
  seq.filter(_.head == "e").map { case Seq("e", i, s, t, attrs @ _*) =>
    if (attrs.size % 2 != 0) {
      println(s"Incorrect edge - odd number of attributes [${attrs.size}] for id=[${attrs.size}}]...skipping")
      None
    } else {
      val attrsM = attrs.grouped(2).toList.map(l => l.head -> l.tail.head).toMap + ("guid" -> i)
      Some(Edge(i, s, t, attrsM))
    }
  }.filterNot(_ == None).map(_.get)

Upvotes: 2

Views: 176

Answers (4)

almendar
almendar

Reputation: 1813

flatMap will automatically remove your None:

List(Some(1), None).flatMap(x => x) == List(1)

That might make it simpler.

EDIT

By discussing matter in comments it seems that best way is to avoid Option and by using combo of filter and map or the collect function that does both in one pass.

Sample code:

List(List(1,2,3),List(1,2,3,4),List(2,3,4)).collect{ case arg @ List(1, rest @ _*) if(rest.length%2==0) => arg.sum } 

Upvotes: 0

lpiepiora
lpiepiora

Reputation: 13749

How about using for comprehension:

def getEdges(seq: Seq[Seq[String]]): Seq[Edge] =
    for {
      Seq("e", i, s, t, attrs@_ *) <- seq if attrs.length % 2 == 0
      attrsM = attrs.grouped(2).map { case Seq(a, b) => a -> b}.toMap + ("guid" -> i)
    } yield Edge(i, s, t, attrsM)

if you want to print invalid lines (not a one-liner any more, but I think still reads quite nicely):

def validLine(cols:Seq[String]) = if (cols.length % 2 == 0) true else {
  println(s"Line $cols is invalid")
  false
}
def getEdges(seq: Seq[Seq[String]]): Seq[Edge] =
  for {
    Seq("e", i, s, t, attrs@_ *) <- seq if attrs.length % 2 == 0
    attrsM = attrs.grouped(2).map { case Seq(a, b) => a -> b}.toMap + ("guid" -> i)
  } yield Edge(i, s, t, attrsM)

Upvotes: 0

Mariusz Nosiński
Mariusz Nosiński

Reputation: 1288

If you no need informations about dropped lines you can use this code:

def getEdges(seq: Seq[Seq[String]]): Seq[Edge] =
  seq.filter(sub => (sub.head == "e") && (sub.length % 2 == 0)).map {
    case Seq("e", i, s, t, attrs @ _*) =>
      val attrsM = attrs.grouped(2).collect { case List(k, v) => k -> v}.toMap + ("guid" -> i)
      Edge(i, s, t, attrsM)
  }

Upvotes: 1

Rex Kerr
Rex Kerr

Reputation: 167901

collect is made for exactly this sort of thing.

// Note--we don't need filter any more as it's part of the condition
seq.collect {
  case Seq("e", i, s, t, attrs @ _*) if checksize(attrs) =>
    val attrsM = ...
    Edge(...)
}

And then something like

def checksize(size: Int) = {
  if (size % 2 == 0) true
  else {
    println("Tsk tsk.")
    false
  }
}

Upvotes: 2

Related Questions