Karan
Karan

Reputation: 371

Intersection not working in apache spark

I am trying to find intersection of two RDD String using apache spark intersection method but it returns empty Array.

val d=sc.parallelize(Seq("web services as a software","RCB vs CSK"))

val d1 = sc.parallelize(Seq("software as a services", "CSK vs RCB"))

d.intersection(d1).collect

Output

res6: Array[String] = Array()

Upvotes: 0

Views: 384

Answers (1)

marios
marios

Reputation: 8996

You are missing the part where you split the sentences into words:

val d=sc.parallelize(Seq("web services as a software","RCB vs CSK")).flatMap(_.split(" "))

val d1 = sc.parallelize(Seq("software as a services", "CSK vs RCB")).flatMap(_.split(" "))

d.intersection(d1).collect

Upvotes: 1

Related Questions