Reputation: 3226
I would like to read a CSV String/File in Scala such that given a case class C
and an error type Error
, the parser fills an Iterable[Either[Error,C]]
. Is there any library that does this or something similar?
For instance, given a class and error
case class Person(name: String, age: Int)
type Error = String
and the CSV String
Foo,19
Ro
Bar,24
the parser would output
Stream(Right(Person("Foo",1)), Left("Cannot read 'Ro'"), Right(Person("Bar", 24)))
UPDATE:
I think my question wasn't clear, so let me clarify: is there a way read CSV in Scala without defining boilerplate? Given any case class, is there a way to load it automatically? I would like to use it in this way:
val iter = csvParserFor[Person].parseLines(lines)
Upvotes: 20
Views: 12031
Reputation: 61726
Starting Scala 2.13
, it's possible to pattern match a String
s by unapplying a string interpolator:
// case class Person(name: String, age: Int)
val csv = "Foo,19\nRo\nBar,24".split("\n")
csv.map {
case s"$name,$age" => Right(Person(name, age.toInt))
case line => Left(s"Cannot read '$line'")
}
// Array(Right(Person("Foo", 19)), Left("Cannot read 'Ro'"), Right(Person("Bar", 24)))
Note that you can also use regex
es within the extractor.
It could help in our case to consider a row invalid if the age isn't an integer:
// val csv = "Foo,19\nRo\nBar,2R".split("\n")
val Age = "(\\d+)".r
csv.map {
case s"$name,${Age(age)}" => Right(Person(name, age.toInt))
case line @ s"$name,$age" => Left(s"Age is not an integer in '$line'")
case line => Left(s"Cannot read '$line'")
}
//Array(Right(Person("Foo", 19)), Left("Cannot read 'Ro'"), Left("Age is not an integer in 'Bar,2R'"))
Upvotes: 6
Reputation: 6168
kantan.csv seems like what you want. If you want 0 boilerplate, you can use its shapeless module and write:
import kantan.csv.ops._
import kantan.csv.generic.codecs._
new File("path/to/csv").asCsvRows[Person](',', false).toList
Which, on your input, will yield:
res2: List[kantan.csv.DecodeResult[Person]] = List(Success(Person(Foo,19)), DecodeFailure, Success(Person(Bar,24)))
Note that the actual return type is an iterator, so you don't actually have to hold the whole CSV file in memory at any point like your example does with Stream
.
If the shapeless dependency is too much, you can drop it and provide your own case class type classes with minimal boilerplate:
implicit val personCodec = RowCodec.caseCodec2(Person.apply, Person.unapply)(0, 1)
Full disclosure: I'm the author of kantan.csv.
Upvotes: 14
Reputation: 1143
Here's a solution using product-collections
import com.github.marklister.collections.io._
import scala.util.Try
case class Person(name: String, age: Int)
val csv="""Foo,19
|Ro
|Bar,24""".stripMargin
class TryIterator[T] (it:Iterator[T]) extends Iterator[Try[T]]{
def next = Try(it.next)
def hasNext=it.hasNext
}
new TryIterator(CsvParser(Person).iterator(new java.io.StringReader(csv))).toList
res14: List[scala.util.Try[Person]] =
List(Success(Person(Foo,19)), Failure(java.lang.IllegalArgumentException: 1 at line 2 => Ro), Success(Person(Bar,24)))
Apart from the error handling this gets pretty close to what you were looking for: val iter = csvParserFor[Person].parseLines(lines)
:
val iter = CsvParser(Person).iterator(reader)
Upvotes: 2
Reputation: 139058
Here's a Shapeless implementation that takes a slightly different approach from the one in your proposed example. This is based on some code I've written in the past, and the main difference from your implementation is that this one is a little more general—for example the actual CSV parsing part is factored out so that it's easy to use a dedicated library.
First for an all-purpose Read
type class (no Shapeless yet):
import scala.util.{ Failure, Success, Try }
trait Read[A] { def reads(s: String): Try[A] }
object Read {
def apply[A](implicit readA: Read[A]): Read[A] = readA
implicit object stringRead extends Read[String] {
def reads(s: String): Try[String] = Success(s)
}
implicit object intRead extends Read[Int] {
def reads(s: String) = Try(s.toInt)
}
// And so on...
}
And then for the fun part: a type class that provides a conversion (that may fail) from a list of strings to an HList
:
import shapeless._
trait FromRow[L <: HList] { def apply(row: List[String]): Try[L] }
object FromRow {
import HList.ListCompat._
def apply[L <: HList](implicit fromRow: FromRow[L]): FromRow[L] = fromRow
def fromFunc[L <: HList](f: List[String] => Try[L]) = new FromRow[L] {
def apply(row: List[String]) = f(row)
}
implicit val hnilFromRow: FromRow[HNil] = fromFunc {
case Nil => Success(HNil)
case _ => Failure(new RuntimeException("No more rows expected"))
}
implicit def hconsFromRow[H: Read, T <: HList: FromRow]: FromRow[H :: T] =
fromFunc {
case h :: t => for {
hv <- Read[H].reads(h)
tv <- FromRow[T].apply(t)
} yield hv :: tv
case Nil => Failure(new RuntimeException("Expected more cells"))
}
}
And finally to make it work with case classes:
trait RowParser[A] {
def apply[L <: HList](row: List[String])(implicit
gen: Generic.Aux[A, L],
fromRow: FromRow[L]
): Try[A] = fromRow(row).map(gen. from)
}
def rowParserFor[A] = new RowParser[A] {}
Now we can write the following, for example, using OpenCSV:
case class Foo(s: String, i: Int)
import au.com.bytecode.opencsv._
import scala.collection.JavaConverters._
val reader = new CSVReader(new java.io.FileReader("foos.csv"))
val foos = reader.readAll.asScala.map(row => rowParserFor[Foo](row.toList))
And if we have an input file like this:
first,10
second,11
third,twelve
We'll get the following:
scala> foos.foreach(println)
Success(Foo(first,10))
Success(Foo(second,11))
Failure(java.lang.NumberFormatException: For input string: "twelve")
(Note that this conjures up Generic
and FromRow
instances for every line, but it'd be pretty easy to change that if performance is a concern.)
Upvotes: 20