Scala Spark creating a new column in the dataframe based on the aggregate count of values in another column

Question

I have a spark dataframe like the on below

+-----+----------+----------+
|   ID|      date|  count   |
+-----+----------+----------+
|54500|2016-05-02|         0|
|54500|2016-05-09|         0|
|54500|2016-05-16|         0|
|54500|2016-05-23|         0|
|54500|2016-06-06|         0|
|54500|2016-06-13|         0|
|54441|2016-06-20|         0|
|54441|2016-06-27|         0|
|54441|2016-07-04|         0|
|54441|2016-07-11|         0|
+-----+----------+----------+

I want to add an additional column that contains the count of records for a specific id in the dataframe while avoiding the for loop . The target dataframe looks like below

+-----+----------+----------+
|   ID|      date|  count   |
+-----+----------+----------+
|54500|2016-05-02|         6|
|54500|2016-05-09|         6|
|54500|2016-05-16|         6|
|54500|2016-05-23|         6|
|54500|2016-06-06|         6|
|54500|2016-06-13|         6|
|54441|2016-06-20|         4|
|54441|2016-06-27|         4|
|54441|2016-07-04|         4|
|54441|2016-07-11|         4|
+-----+----------+----------+

Tried this

import org.apache.spark.sql.expressions.Window

var s = Window.partitionBy("ID")
var df2 = df.withColumn("count", count.over(s))

this is giving error

error: ambiguous reference to overloaded definition,
both method count in object functions of type (columnName: String)org.apache.spark.sql.TypedColumn[Any,Long]
and  method count in object functions of type (e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
match expected type ?

Scala Spark creating a new column in the dataframe based on the aggregate count of values in another column

Answers (1)

Related Questions