Reputation: 11
I have two RDDs, say A and B, of the type RDD[Array[Int]]
and want to compute the set difference A - B and B - A. I tried the following code
val R1 = A.subtract(B)
val R2 = B.subtract(A)
but it does not return the correct answer. In a previous answer, it is mentioned that "Performing set operations like subtract with mutable types (Array in this example) is usually unsupported, or at least not recommended." So I have to change the code to
val A1 = A.map(_.to[ArrayBuffer]).persist()
val B1 = B.map(_.to[ArrayBuffer]).persist()
val R1 = A1.subtract(B1)
val R2 = B1.subtract(A1)
Now it returns the correct answer. I want to know if there is any more efficient way to do this.
Upvotes: 0
Views: 121
Reputation: 4017
The linked answer is misleading. The problem is not mutability. Arraybuffer
which solved the problem is mutable as well.
subtract
internally compares elements using equals
and equals
method of java arrays is broken (it just defaults to reference equality).
A1.map(_.toSeq).subtract(A2.map(_.toSeq))
will work.
.toSeq
wraps java arrays into scala's WrappedArray
which has less surprising implementation of equality.
Upvotes: 1