synapse
synapse

Reputation: 5728

Using extractors to parse text files

I'm trying to improve a CSV parsing routine and feel that extractors could be useful here but can't figure them out. Suppose there's a file with user ids and emails:

1,[email protected]
2,[email protected]
3,[email protected]

If the User class is defined as case class User(id: Int, email: String) everything is pretty easy with something like

lines map { line =>
  line split "," match {
    case Array(id, email) => User(id.toInt, email)
  }
}

What I don't understand is how to deal with the case where User class can have complex properties e.g

case class Email(username: String, host: string)
case class User(id: Int, email: Email)

Upvotes: 0

Views: 482

Answers (3)

Andreas Neumann
Andreas Neumann

Reputation: 10894

Here you go, an example using a custom Extractor.

// 1,Alice,21212,Baltimore,MD" -> User(1, Alice, Address(21212, Baltimore, MD))

Define a custom Extractor that creates the objects out of given String:

object UserExtractor {
    def unapply(s: String) : Option[User] = try {
        Some( User(s) )
    }
    catch {
        // bettor handling of bad cases
        case e: Throwable => None
    }
}

Case classes to hold the data with a custom apply on Comapnion object on User:

case class Address(code: String, cit: String, county: String)

case class User(id: Int, name: String, address: Address)
object User {
    def apply(s: String) : User = s.split(",") match {
        case Array(id, name, code, city, county) => User(id.toInt, name, Address(code, city, county)  )
    }
}

Unapplying on a valid string (in the example valid means the correct number of fields).

"1,Alice,21212,Baltimore,MD" match { case UserExtractor(u) => u }
res0: User = User(1,Alice,Address(21212,Baltimore,MD))

More tests could be added with more custom apply methods.

Upvotes: 1

Andreas Neumann
Andreas Neumann

Reputation: 10894

I'd use a single RegexExtractor :

val lines = List(
  "1,[email protected]",
  "2,[email protected]",
  "3,[email protected]"
)

case class Email(username: String, host: String)
case class User(id: Int, email: Email)


val idAndEMail = """^([^,]+),([^@]+)@(.+)$""".r  

and define a function that transforms a line to the an User :

def lineToUserWithMail(line: String) : Option[User] = 
  idAndEMail.findFirstMatchIn(line) map { 
case userWithEmail(id,user,host) => User(id.toInt, Email(user,host) )
  }

Applying the function to all lines

lines flatMap lineToUserWithMail
//res29: List[User] = List(User(1,Email(alice,alice.com)), User(2,Email(bob,bob.com)), User(3,Email(carol,carol.com)))

Alternatively you could implement custom Extractors on the case classe by adding an unnapply Method. But for that case it wouldn't be worth the pain.

Here is an example for unapply

class Email(user:String, host:String)
object Email {
def unapply(s: String) : Option[(String,String)] = s.split("@") match {
    case Array(user, host) => Some( (user,host) )
    case _ => None 
}

}

"[email protected]" match {
case Email(u,h) => println( s"$u , $h" )
}
// prints bob , bob.com

A word of warning on using Regex to parse CSV-data. It's not as easy as you might think, i would recommend to use a CSV-Reader as http://supercsv.sourceforge.net/ which handles some nasty edge cases out of the box.

Upvotes: 0

dhg
dhg

Reputation: 52681

You probably want to use a regular expression to extract the contents of the email address. Maybe something like this:

val lines = Vector(
  "1,[email protected]",
  "2,[email protected]",
  "3,[email protected]")

case class Email(username: String, host: String)
case class User(id: Int, email: Email)

val EmailRe = """(.*)@(.*\.com)""".r  // substitute a real email address regex here

lines.map { line =>
  line.split(",") match {
    case Array(id, EmailRe(uname, host)) => User(id.toInt, Email(uname, host))
  }
}

Upvotes: 2

Related Questions