Reputation: 36
Input :
+------+--------+
|Test |01-12-20|
|Ravi | null|
|Son | null|
Expected output :
+------+--------+
|Test |01-12-20|
|Ravi |02-12-20|
|Son |03-12-20|
I tried with .withColumn(col("dated"),date_add(col("dated"),1));
But this result in NULL for all the columns values.
Could you please help me with getting the incremental values on the date second column?
Upvotes: 0
Views: 498
Reputation: 2003
This will be a working solution for you
Input
df = spark.createDataFrame([("Test", "01-12-20"),("Ravi", None),("Son", None)],[ "col1","col2"])
df.show()
df = df.withColumn("col2", F.to_date(F.col("col2"),"dd-MM-yy"))
# a dummy col for window function
df = df.withColumn("del_col", F.lit(0))
_w = W.partitionBy(F.col("del_col")).orderBy(F.col("del_col").desc())
df = df.withColumn("rn_no", F.row_number().over(_w)-1)
# Create a column with the same date
df = df.withColumn("dated", F.first("col2").over(_w))
df = df.selectExpr('*', 'date_add(dated, rn_no) as next_date')
df.show()
DF
+----+--------+
|col1| col2|
+----+--------+
|Test|01-12-20|
|Ravi| null|
| Son| null|
+----+--------+
Final Output
+----+----------+-------+-----+----------+----------+
|col1| col2|del_col|rn_no| dated| next_date|
+----+----------+-------+-----+----------+----------+
|Test|2020-12-01| 0| 0|2020-12-01|2020-12-01|
|Ravi| null| 0| 1|2020-12-01|2020-12-02|
| Son| null| 0| 2|2020-12-01|2020-12-03|
+----+----------+-------+-----+----------+----------+
Upvotes: 3