Reputation: 3576
I'd like to use Java Spark to compare two JavaPairRDD based on their keys, compare their values to see if the same key has exactly the same values.
Right now, I'm only checking the count() of intersection and union, but this is not enough, like below:
JavaPairRDD<String, String> intersectionJavaPairRDD = hBaseJavaPairRDD.intersection(hiveJavaPairRDD);
JavaPairRDD<String, String> unionJavaPairRDD = hBaseJavaPairRDD.union(hiveJavaPairRDD).distinct();
if (intersectionJavaPairRDD.count() != unionJavaPairRDD.count()
|| hiveJavaPairRDD.count() != hBaseJavaPairRDD.count()) {
System.err.println(
"ERROR: SxS validation failed...");
System.exit(-1);
}
How can I compare each value when they have the same key?
Thanks a lot!
Upvotes: 0
Views: 1129
Reputation: 2424
I'm coming from scala but I do believe that with a little syntax changes it will work also for java.
My idea is to join the Two RDD
s and then compare the two columns of values.
val isEquals = hBaseJavaPairRDD
.join(hiveJavaPairRDD)
.map {
case (id, (v1, v2)) => v1 == v2
}
.reduce(_ && _)
The idea behind this solution is the following :
RDD
s in the same row. This is performed with the join
operation.RDD
) so that for each row we put true
if the two values are equals false
otherwisereduce
function on this mapped RDD
with the AND as a binary operation between elements.Applying the reduce
function returns true
if all the elements in the joined RDD
are true
i.e. all the values are equals, false
otherwise.
Sorry for answering in scala, hope it helps
Upvotes: 1