Reputation: 5728
I'm trying to improve a CSV parsing routine and feel that extractors could be useful here but can't figure them out. Suppose there's a file with user ids and emails:
1,[email protected]
2,[email protected]
3,[email protected]
If the User
class is defined as case class User(id: Int, email: String)
everything is pretty easy with something like
lines map { line =>
line split "," match {
case Array(id, email) => User(id.toInt, email)
}
}
What I don't understand is how to deal with the case where User
class can have complex properties e.g
case class Email(username: String, host: string)
case class User(id: Int, email: Email)
Upvotes: 0
Views: 482
Reputation: 10894
Here you go, an example using a custom Extractor.
// 1,Alice,21212,Baltimore,MD" -> User(1, Alice, Address(21212, Baltimore, MD))
Define a custom Extractor that creates the objects out of given String:
object UserExtractor {
def unapply(s: String) : Option[User] = try {
Some( User(s) )
}
catch {
// bettor handling of bad cases
case e: Throwable => None
}
}
Case classes to hold the data with a custom apply on Comapnion object on User:
case class Address(code: String, cit: String, county: String)
case class User(id: Int, name: String, address: Address)
object User {
def apply(s: String) : User = s.split(",") match {
case Array(id, name, code, city, county) => User(id.toInt, name, Address(code, city, county) )
}
}
Unapplying on a valid string (in the example valid means the correct number of fields).
"1,Alice,21212,Baltimore,MD" match { case UserExtractor(u) => u }
res0: User = User(1,Alice,Address(21212,Baltimore,MD))
More tests could be added with more custom apply methods.
Upvotes: 1
Reputation: 10894
I'd use a single RegexExtractor :
val lines = List(
"1,[email protected]",
"2,[email protected]",
"3,[email protected]"
)
case class Email(username: String, host: String)
case class User(id: Int, email: Email)
val idAndEMail = """^([^,]+),([^@]+)@(.+)$""".r
and define a function that transforms a line to the an User :
def lineToUserWithMail(line: String) : Option[User] =
idAndEMail.findFirstMatchIn(line) map {
case userWithEmail(id,user,host) => User(id.toInt, Email(user,host) )
}
Applying the function to all lines
lines flatMap lineToUserWithMail
//res29: List[User] = List(User(1,Email(alice,alice.com)), User(2,Email(bob,bob.com)), User(3,Email(carol,carol.com)))
Alternatively you could implement custom Extractors on the case classe by adding an unnapply Method. But for that case it wouldn't be worth the pain.
Here is an example for unapply
class Email(user:String, host:String)
object Email {
def unapply(s: String) : Option[(String,String)] = s.split("@") match {
case Array(user, host) => Some( (user,host) )
case _ => None
}
}
"[email protected]" match {
case Email(u,h) => println( s"$u , $h" )
}
// prints bob , bob.com
A word of warning on using Regex to parse CSV-data. It's not as easy as you might think, i would recommend to use a CSV-Reader as http://supercsv.sourceforge.net/ which handles some nasty edge cases out of the box.
Upvotes: 0
Reputation: 52681
You probably want to use a regular expression to extract the contents of the email address. Maybe something like this:
val lines = Vector(
"1,[email protected]",
"2,[email protected]",
"3,[email protected]")
case class Email(username: String, host: String)
case class User(id: Int, email: Email)
val EmailRe = """(.*)@(.*\.com)""".r // substitute a real email address regex here
lines.map { line =>
line.split(",") match {
case Array(id, EmailRe(uname, host)) => User(id.toInt, Email(uname, host))
}
}
Upvotes: 2