Reputation: 75
I have a csv file which is "semi-structured"
canal,username,email,age
facebook,pepe22,[email protected],24
twitter,foo-24,[email protected],33
facebook,caty24,,22
suppose that i want the first column the second and the third column into an RDD org.apache.spark.rdd.RDD[(String, String, String)]
I am realy new, im using spark 1.4.1 ,my code reach here
val rdd = sc.textFile("/user/ergorenova/socialmedia/allus/test").map(_.split(","))
Can someone help me?
I would really appreciate it
Upvotes: 0
Views: 73
Reputation: 3532
val rdd = sc.textFile("/user/ergorenova/socialmedia/allus/test")
.map( _.split(",",-1) match {
case Array(canal, username, email) => (canal, username, email)
case Array(canal, username, email, age) => (canal, username, email)
})
You will obtain a tuple made out of the first,second and third column.
Upvotes: 1