volk
volk

Reputation: 21

Spark - Reduce with division operator

I have 2 sets, one with positive and one with negative samples. First value in array is item identification, second value refers to sum of such items.

positive: Array[(String, Int)]

negative: Array[(String, Int)]

I would like to construct array result, which will contain item name and its positive to negative ratio as float number. The command below returns me only integer ratio.

val result = positive.union(negativeCount).reduceByKey((a, b) => (a / b)

Can you please advice how to make the ratio a float number?

Thanks.

Upvotes: 0

Views: 3337

Answers (2)

zero323
zero323

Reputation: 330083

As far as I understand your intentions you should use join not an union

val positive = sc.parallelize(Seq(("a", 1), ("b", 2)))
val negative = sc.parallelize(Seq(("a", 4), ("b", 1)))

val ratios = positive
  .join(negative)
  .mapValues{case (x: Int, y: Int) => x.toFloat / y}

ratios.collect
// Array[(String, Float)] = Array((a,0.25), (b,2.0)) 

With DataFrames:

val ratiosDF = positive.toDF("pk", "pv")
  .join(negative.toDF("nk", "nv"), $"pk" === $"nk")
  .select($"pk".alias("k"), $"pv".divide($"nv").alias("v"))
ratiosDF.show

// +---+----+
// |  k|   v|
// +---+----+
// |  a|0.25|
// |  b| 2.0|
// +---+----+

Using union followed by reduceByKey doesn't make sense and gives no strong guarantees about the order of values.

Upvotes: 4

Dr. Vick
Dr. Vick

Reputation: 195

Make one of the integers float using toFloat

val result = positive.union(negativeCount)
  .mapValues(_.toFloat)
  .reduceByKey((a, b) => (a / b))

Upvotes: -1

Related Questions