Reputation:
Consider the following dataframe:
+-------+-----------+-------+
| rid| createdon| count|
+-------+-----------+-------+
| 124| 2017-06-15| 1 |
| 123| 2017-06-14| 2 |
| 123| 2017-06-14| 1 |
+-------+-----------+-------+
I need to add the count
column among rows which has createdon
and rid
of are same.
Therefore the resultant dataframe should be follows:
+-------+-----------+-------+
| rid| createdon| count|
+-------+-----------+-------+
| 124| 2017-06-15| 1 |
| 123| 2017-06-14| 3 |
+-------+-----------+-------+
I am using Spark 2.0.2.
I have tried agg, conditions inside select etc, but couldn't find the solution. Can anyone help me?
Upvotes: 0
Views: 1512
Reputation: 27373
this should do what you want:
import org.apache.spark.sql.functions.sum
df
.groupBy($"rid",$"createdon")
.agg(sum($"count").as("count"))
.show
Upvotes: 0
Reputation: 3863
Try this
import org.apache.spark.sql.{functions => func}
df.groupBy($"rid", $"createdon").agg(func.sum($"count").alias("count"))
Upvotes: 1