Sai
Sai

Reputation: 1117

How to perform division operation in dataFrame Spark using Scala?

I have a dataFrame something like below.

+---+---+-----+
|uId| Id| sum |
+---+---+-----+
|  3|  1|  1.0|
|  7|  1|  1.0|
|  1|  2|  3.0|
|  1|  1|  1.0|
|  6|  5|  1.0|

using above DataFrame, I want to generate new DataFrame mention below Sum column should be :-

For example:

For uid=3 and id=1, my sum column value should be  (old sum value  * 1 / count of ID(1)) I.e.
1.0*1/3=0.333

For uid=7 and id=1, my sum column value should be  (old sum value  * 1 / count of ID(1)) I.e.
1.0*1/3=0.333

For uid=1 and id=2, my sum column value should be  (old sum value  * 1 / count of ID(1)) I.e.
3.0*1/1=3.0

For uid=6 and id=5, my sum column value should be  (old sum value  * 1 / count of ID(1)) I.e.
1.0*1/1=1.0

My final output should be:

+---+---+---------+
|uId| Id|  sum    |
+---+---+---------+
|  3|  1|  0.33333|     
|  7|  1|  0.33333| 
|  1|  2|  3.0    |
|  1|  1|  0.3333 |
|  6|  5|  1.0    |

Upvotes: 0

Views: 6973

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

You can use Window function to get the count of each group of id column and finally use that count to divide the original sum

import org.apache.spark.sql.expressions.Window
val windowSpec = Window.partitionBy("id")

import org.apache.spark.sql.functions._
df.withColumn("sum", $"sum"/count("id").over(windowSpec))

you should have the final dataframe as

+---+---+------------------+
|uId|Id |sum               |
+---+---+------------------+
|3  |1  |0.3333333333333333|
|7  |1  |0.3333333333333333|
|1  |1  |0.3333333333333333|
|6  |5  |1.0               |
|1  |2  |3.0               |
+---+---+------------------+

Upvotes: 2

Related Questions