Reputation: 120
I have a RDD RDD1
with the following Schema:
RDD[String, Array[String]]
(let's call it RDD1
)
and I would like create a new RDD RDD2
with each row as RDD[String,String]
with the key and value belonging to RDD1
.
For example:
RDD1 =Array(("Fruit",("Orange","Apple","Peach")),("Shape",("Square","Rectangle")),("Mathematician",("Aryabhatt"))))
I want the output to be as:
RDD2 = Array(("Fruit","Orange"),("Fruit","Apple"),("Fruit","Peach"),("Shape","Square"),("Shape","Rectangle"),("Mathematician","Aryabhatt"))
Can someone help me with this piece of code?
My Try:
val R1 = RDD1.map(line => (line._1,line._2.split((","))))
val R2 = R1.map(line => line._2.foreach(ph => ph.map(line._1)))
This gives me an error:
error: value map is not a member of Char
I understand that it is because that map function is only applicable to the RDDs
and not each string/char
. Please help me with a way to use nested functions for this purpose in Spark
.
Upvotes: 0
Views: 353
Reputation: 21485
Break down the problem.
("Fruit",Array("Orange","Apple","Peach")
-> Array(("Fruit", "Orange"), ("Fruit", "Apple"), ("Fruit", "Peach"))
def flattenLine(line: (String, Array[String])) = line._2.map(x => (line._1, x)
rdd1.flatMap(flattenLine)
Upvotes: 4