R. Gabriel
R. Gabriel

Reputation: 75

Maniputale CSV with scala spark

I have a csv file which is "semi-structured"

canal,username,email,age
facebook,pepe22,[email protected],24
twitter,foo-24,[email protected],33
facebook,caty24,,22

suppose that i want the first column the second and the third column into an RDD org.apache.spark.rdd.RDD[(String, String, String)]

I am realy new, im using spark 1.4.1 ,my code reach here

val rdd = sc.textFile("/user/ergorenova/socialmedia/allus/test").map(_.split(","))

Can someone help me?

I would really appreciate it

Upvotes: 0

Views: 73

Answers (1)

Radu Ionescu
Radu Ionescu

Reputation: 3532

val rdd = sc.textFile("/user/ergorenova/socialmedia/allus/test")
            .map( _.split(",",-1) match {

               case Array(canal, username, email) => (canal, username, email)

               case Array(canal, username, email, age) => (canal, username, email)
            })

You will obtain a tuple made out of the first,second and third column.

Upvotes: 1

Related Questions