Reputation: 417
I have following Movies data which is like below,
I should get count of movies in each year like 2002,2 and 2004,1
Littlefield, John (I) x House 2002
Houdyshell, Jayne demon State 2004
Houdyshell, Jayne mall in Manhattan 2002
val data=sc.textFile("..line to file")
val dataSplit=data.map(line=>{var d=line.split("\t");(d(0),d(1),d(2))})
What i am unable to understand is when i use dataSplit.take(2).foreach(println) I see that d(0) is first two columns Littlefield, John (I) which are firstname and lastname and d(1) is movie name such as "x House" and d(2) is year. How can i get the count of movies each year?
Upvotes: 0
Views: 77
Reputation: 13581
Use reduceByKey
with the mapped tuple in this way.
val dataSplit = data
.map(line => {var d = line.split("\t"); (d(2), 1)}) // (2002, 1)
.reduceByKey((a, b) => a + b)
// .collect() gives the result: Array((2004,1), (2002,2))
Upvotes: 1