Fisher Coder
Fisher Coder

Reputation: 3576

How to compare two JavaPairRDD by key and compare value?

I'd like to use Java Spark to compare two JavaPairRDD based on their keys, compare their values to see if the same key has exactly the same values.

Right now, I'm only checking the count() of intersection and union, but this is not enough, like below:

        JavaPairRDD<String, String> intersectionJavaPairRDD = hBaseJavaPairRDD.intersection(hiveJavaPairRDD);
        JavaPairRDD<String, String> unionJavaPairRDD = hBaseJavaPairRDD.union(hiveJavaPairRDD).distinct();

        if (intersectionJavaPairRDD.count() != unionJavaPairRDD.count()
            || hiveJavaPairRDD.count() != hBaseJavaPairRDD.count()) {
            System.err.println(
                "ERROR: SxS validation failed...");
            System.exit(-1);
        }

How can I compare each value when they have the same key?

Thanks a lot!

Upvotes: 0

Views: 1129

Answers (1)

Haroun Mohammedi
Haroun Mohammedi

Reputation: 2424

I'm coming from scala but I do believe that with a little syntax changes it will work also for java.

My idea is to join the Two RDDs and then compare the two columns of values.

val isEquals = hBaseJavaPairRDD
              .join(hiveJavaPairRDD)
              .map {
                case (id, (v1, v2)) => v1 == v2
              }
              .reduce(_ && _)

The idea behind this solution is the following :

  1. For each Key we put the the values of the first and the second RDDs in the same row. This is performed with the join operation.
  2. Map the results (joined RDD) so that for each row we put true if the two values are equals false otherwise
  3. Then applying a reduce function on this mapped RDD with the AND as a binary operation between elements.

Applying the reduce function returns true if all the elements in the joined RDD are true i.e. all the values are equals, false otherwise.

Sorry for answering in scala, hope it helps

Upvotes: 1

Related Questions