Reputation: 21
I have 2 sets, one with positive and one with negative samples. First value in array is item identification, second value refers to sum of such items.
positive: Array[(String, Int)]
negative: Array[(String, Int)]
I would like to construct array result, which will contain item name and its positive to negative ratio as float number. The command below returns me only integer ratio.
val result = positive.union(negativeCount).reduceByKey((a, b) => (a / b)
Can you please advice how to make the ratio a float number?
Thanks.
Upvotes: 0
Views: 3337
Reputation: 330083
As far as I understand your intentions you should use join
not an union
val positive = sc.parallelize(Seq(("a", 1), ("b", 2)))
val negative = sc.parallelize(Seq(("a", 4), ("b", 1)))
val ratios = positive
.join(negative)
.mapValues{case (x: Int, y: Int) => x.toFloat / y}
ratios.collect
// Array[(String, Float)] = Array((a,0.25), (b,2.0))
With DataFrames:
val ratiosDF = positive.toDF("pk", "pv")
.join(negative.toDF("nk", "nv"), $"pk" === $"nk")
.select($"pk".alias("k"), $"pv".divide($"nv").alias("v"))
ratiosDF.show
// +---+----+
// | k| v|
// +---+----+
// | a|0.25|
// | b| 2.0|
// +---+----+
Using union
followed by reduceByKey
doesn't make sense and gives no strong guarantees about the order of values.
Upvotes: 4
Reputation: 195
Make one of the integers float using toFloat
val result = positive.union(negativeCount)
.mapValues(_.toFloat)
.reduceByKey((a, b) => (a / b))
Upvotes: -1