fehorak
fehorak

Reputation: 11

Computing the set difference of two RDDs of Array[Int]

I have two RDDs, say A and B, of the type RDD[Array[Int]] and want to compute the set difference A - B and B - A. I tried the following code

val R1 = A.subtract(B)
val R2 = B.subtract(A)

but it does not return the correct answer. In a previous answer, it is mentioned that "Performing set operations like subtract with mutable types (Array in this example) is usually unsupported, or at least not recommended." So I have to change the code to

val A1 = A.map(_.to[ArrayBuffer]).persist()
val B1 = B.map(_.to[ArrayBuffer]).persist()
val R1 = A1.subtract(B1)
val R2 = B1.subtract(A1)

Now it returns the correct answer. I want to know if there is any more efficient way to do this.

Upvotes: 0

Views: 121

Answers (1)

simpadjo
simpadjo

Reputation: 4017

The linked answer is misleading. The problem is not mutability. Arraybuffer which solved the problem is mutable as well.

subtract internally compares elements using equals and equals method of java arrays is broken (it just defaults to reference equality).

A1.map(_.toSeq).subtract(A2.map(_.toSeq)) will work.

.toSeq wraps java arrays into scala's WrappedArray which has less surprising implementation of equality.

Upvotes: 1

Related Questions