Pär Eriksson
Pär Eriksson

Reputation: 367

Apache spark: map csv file to key: value format

I'm totally new to Apache Spark and Scala, and I'm having problems with mapping a .csv file into a key-value (like JSON) structure.

What I want to accomplish is to get the .csv file:

user, timestamp, event
ec79fcac8c76ebe505b76090f03350a2,2015-03-06 13:52:56,USER_PURCHASED
ad0e431a69cb3b445ddad7bb97f55665,2015-03-06 13:52:57,USER_SHARED
83b2d8a2c549fbab0713765532b63b54,2015-03-06 13:52:57,USER_SUBSCRIBED
ec79fcac8c76ebe505b76090f03350a2,2015-03-06 13:53:01,USER_ADDED_TO_PLAYLIST
...

Into a structure like:

ec79fcac8c76ebe505b76090f03350a2: [(2015-03-06 13:52:56,USER_PURCHASED), (2015-03-06 13:53:01,USER_ADDED_TO_PLAYLIST)]
ad0e431a69cb3b445ddad7bb97f55665: [(2015-03-06 13:52:57,USER_SHARED)]
83b2d8a2c549fbab0713765532b63b54: [(2015-03-06 13:52:57,USER_SUBSCRIBED)]
...

How can this be done if the file is read by:

val csv = sc.textFile("file.csv")

Help is very much appreciated!

Upvotes: 1

Views: 2244

Answers (2)

Daniel Langdon
Daniel Langdon

Reputation: 5999

Something like:

     case class MyClass(user: String, date: String, event: String)
     def csvToMyClass(line: String) =
     {
        val split = line.split(',')
        // This is a good place to do validations 
        // And convert strings to numbers, enums, UUIDs, etc.
        MyClass(split(0), split(1), split(2))
     }

     val csv = sc.textFile("file.csv")
        .map(scvToMyClass)

Of course, do a little more work to have more concrete data types on your class rather than just strings...

This is for reading the CSV file into a structure (seems to be your main question). If you then need to merge all data for a single user you can map to a key/value tuple (String -> (String, String)) instead and use .aggregateByKey() to join all tuples for a user. Your aggregation function can then return whatever structure you want.

Upvotes: 1

szefuf
szefuf

Reputation: 520

Daniel is right.

Later you just have to do:

csv.keyBy(_.user).groupByKey

And that's all.

Upvotes: 0

Related Questions