Reputation: 701
I am having below dataframe with values. I want to add next concecative id in the column id which must be unique as well incrementing in nature.
+----------------+----+--------------------+
|local_student_id| id| last_updated|
+----------------+----+--------------------+
| 610931|null| null|
| 599768| 3|2020-02-26 15:47:...|
| 633719|null| null|
| 612949| 2|2020-02-26 15:47:...|
| 591819| 1|2020-02-26 15:47:...|
| 595539| 4|2020-02-26 15:47:...|
| 423287|null| null|
| 641322| 5|2020-02-26 15:47:...|
+----------------+----+--------------------+
I want below expected output. can anybody hemp me? I am new to Pyspark. and also want to add current timestamp in last_updated column.
+----------------+----+--------------------+
|local_student_id| id| last_updated|
+----------------+----+--------------------+
| 610931| 6|2020-02-26 16:00:...|
| 599768| 3|2020-02-26 15:47:...|
| 633719| 7|2020-02-26 16:00:...|
| 612949| 2|2020-02-26 15:47:...|
| 591819| 1|2020-02-26 15:47:...|
| 595539| 4|2020-02-26 15:47:...|
| 423287| 8|2020-02-26 16:00:...|
| 641322| 5|2020-02-26 15:47:...|
+----------------+----+--------------------+
actually i tried
final_data = final_data.withColumn(
'id', when(col('id').isNull(), row_number() + max(col('id'))).otherwise(col('id')))
but it gives the below Error:-
: org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and '`local_student_id`' is not an aggregate function. Wrap '(CASE WHEN (`id` IS NULL) THEN (CAST(row_number() AS BIGINT) + max(`id`)) ELSE `id` END AS `id`)' in windowing function(s) or wrap '`local_student_id`' in first() (or first_value) if you don't care which value you get.;;
Upvotes: 1
Views: 752
Reputation: 15258
here is the code you need :
from pyspark.sql import functions as F, Window
max_id = final_data.groupBy().max("id").collect()[0][0]
final_data.withColumn(
"id",
F.coalesce(
F.col("id"),
F.row_number().over(Window.orderBy("id")) + F.lit(max_id)
)
).withColumn(
"last_updated",
F.coalesce(
F.col("last_updated"),
F.current_timestamp()
)
)
Upvotes: 3