Zdev
Zdev

Reputation: 36

How to add the incremental date value with respect to first row value in spark dataframe

Input :

+------+--------+  
|Test  |01-12-20|  
|Ravi  |    null|  
|Son   |    null|

Expected output :

+------+--------+  
|Test  |01-12-20|  
|Ravi  |02-12-20|  
|Son   |03-12-20|

I tried with .withColumn(col("dated"),date_add(col("dated"),1)); But this result in NULL for all the columns values.

Could you please help me with getting the incremental values on the date second column?

Upvotes: 0

Views: 498

Answers (1)

dsk
dsk

Reputation: 2003

This will be a working solution for you

Input

df = spark.createDataFrame([("Test", "01-12-20"),("Ravi", None),("Son", None)],[ "col1","col2"])
df.show()
df = df.withColumn("col2", F.to_date(F.col("col2"),"dd-MM-yy"))
# a dummy col for window function
df = df.withColumn("del_col", F.lit(0))
_w = W.partitionBy(F.col("del_col")).orderBy(F.col("del_col").desc())
df = df.withColumn("rn_no", F.row_number().over(_w)-1)
# Create a column with the same date
df = df.withColumn("dated", F.first("col2").over(_w))

df = df.selectExpr('*', 'date_add(dated, rn_no) as next_date')
df.show()

DF

+----+--------+
|col1|    col2|
+----+--------+
|Test|01-12-20|
|Ravi|    null|
| Son|    null|
+----+--------+

Final Output

+----+----------+-------+-----+----------+----------+
|col1|      col2|del_col|rn_no|     dated| next_date|
+----+----------+-------+-----+----------+----------+
|Test|2020-12-01|      0|    0|2020-12-01|2020-12-01|
|Ravi|      null|      0|    1|2020-12-01|2020-12-02|
| Son|      null|      0|    2|2020-12-01|2020-12-03|
+----+----------+-------+-----+----------+----------+

Upvotes: 3

Related Questions