Reputation: 371
I am trying to find intersection of two RDD String using apache spark intersection method but it returns empty Array.
val d=sc.parallelize(Seq("web services as a software","RCB vs CSK"))
val d1 = sc.parallelize(Seq("software as a services", "CSK vs RCB"))
d.intersection(d1).collect
Output
res6: Array[String] = Array()
Upvotes: 0
Views: 384
Reputation: 8996
You are missing the part where you split the sentences into words:
val d=sc.parallelize(Seq("web services as a software","RCB vs CSK")).flatMap(_.split(" "))
val d1 = sc.parallelize(Seq("software as a services", "CSK vs RCB")).flatMap(_.split(" "))
d.intersection(d1).collect
Upvotes: 1