user2639287
user2639287

Reputation: 1

subtract the values in a paired RDD

I am new to Scala and Spark .

There are 2 RDDs like

RDD_A= (keyA,5),(KeyB,10)

RDD_B= (keyA,3),(KeyB,7)

how do I calculate : RDD_A-RDD_B so that I get (keyA,2),(KeyB,3)

I tried subtract and subtractByKey but I am unable to get similar output like above

Upvotes: 0

Views: 195

Answers (2)

QuickSilver
QuickSilver

Reputation: 4045

RDD solution for the question Please find inline code comments for the explanation

object SubtractRDD {

  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[*]").getOrCreate(); // Create Spark Session

    val list1 = List(("keyA",5),("keyB",10))
    val list2 = List(("keyA",3),("keyB",7))
    val rdd1= spark.sparkContext.parallelize(list1)  // convert list to RDD
    val rdd2= spark.sparkContext.parallelize(list2)

    val result = rdd1.join(rdd2)  // Inner join RDDs
      .map(x => (x._1, x._2._1 - x._2._2 ))  // Combiner function for RDDs
      .collectAsMap()  // Collect result as Map
    println(result)
  }

}

Upvotes: 0

Duelist
Duelist

Reputation: 1572

Let's assume that each RDD has only one value with specified key:

val df =
  Seq(
    ("A", 5),
    ("B", 10)
  ).toDF("key", "value")

val df2 =
  Seq(
    ("A", 3),
    ("B", 7)
  ).toDF("key", "value")

You can merge these RDDs using union and perform the computation via groupBy as follows:

import org.apache.spark.sql.functions._
df.union(df2)
  .groupBy("key")
  .agg(first("value").minus(last("value")).as("value"))
  .show()

will print:

+---+-----+
|key|value|
+---+-----+
|  B|    3|
|  A|    2|
+---+-----+

Upvotes: 1

Related Questions