user2708013
user2708013

Reputation: 417

How to get count of year using spark scala

I have following Movies data which is like below, I should get count of movies in each year like 2002,2 and 2004,1

Littlefield, John (I)   x House 2002
Houdyshell, Jayne   demon State 2004
Houdyshell, Jayne   mall in Manhattan   2002

val data=sc.textFile("..line to file")
val dataSplit=data.map(line=>{var d=line.split("\t");(d(0),d(1),d(2))})

What i am unable to understand is when i use dataSplit.take(2).foreach(println) I see that d(0) is first two columns Littlefield, John (I) which are firstname and lastname and d(1) is movie name such as "x House" and d(2) is year. How can i get the count of movies each year?

Upvotes: 0

Views: 77

Answers (1)

Lamanus
Lamanus

Reputation: 13581

Use reduceByKey with the mapped tuple in this way.

val dataSplit = data
  .map(line => {var d = line.split("\t"); (d(2), 1)}) // (2002, 1)
  .reduceByKey((a, b) => a + b)

// .collect() gives the result: Array((2004,1), (2002,2))

Upvotes: 1

Related Questions