Sandip Armal Patil
Sandip Armal Patil

Reputation: 5905

get count of common string from two RDD in scala

I have 2 RDD i.e. RDD[String] andd RDD[String,String] and their content are as following.

RDD[String]                     RDD[String,String]
mobile                          laptop,aa
smartphone                      printer,bb
desktop                         scanner,ya
laptop                          mobile,gb
printer                         burger,gn

I need to intersect this two RDD and need to get count of common keyword. My output should be 3 because printer,laptop and mobile are comman.

I tried with intersection() but didn't get it. I have done with this array but don't know how to do with RDD(because i need to work on RDD).

Here what I have tried.

tokenArray.intersect(param._1.split("/")).size > 2)   

Please give me reference or hint.

Upvotes: 0

Views: 1032

Answers (1)

Till Rohrmann
Till Rohrmann

Reputation: 13346

Does the following solves your problem?

val keywords = sc.parallelize(Seq("mobile", "smartphone", "desktop", "laptop", "printer"))
val data = sc.parallelize(Seq(("laptop", "aa"), ("printer", "bb"), ("scanner", "ya"),
  ("mobile", "gb"), ("burger", "gn")))

val keysInData = data.map(_._1)

val result = keywords.intersection(keysInData).count()

Upvotes: 1

Related Questions