Reputation: 707
I have pyspark dataframe, in which data column is there, which has weekend dates as well. I just want to change these dates to previous or next working days.
from pyspark.sql.session import SparkSession
spark = SparkSession.builder.getOrCreate()
columns = ['Date', 'id', 'dogs', 'cats']
vals = [('04-05-2018',1, 2, 0), ('05-05-2018',2, 0, 1), ('06-05-2018',2, 0, 1)]
df = spark.createDataFrame(vals, columns)
df.show()
DataFrame look like:
+----------+---+----+----+
| Date| id|dogs|cats|
+----------+---+----+----+
|04-05-2018| 1| 2| 0|
|05-05-2018| 2| 0| 1|
|06-05-2018| 2| 0| 1|
+----------+---+----+----+
Now, i'm able to identify the weekday, as in separate column
df = df.withColumn('Date', unix_timestamp(df['Date'].cast("string"), 'dd-MM-yyyy').cast("double").cast('timestamp').cast('date'))
df = df.select('Date', date_format('Date', 'u').alias('dow_number'), 'id', 'dogs', 'cats')
temp = df
temp.show()
+----------+----------+---+----+----+
| Date|dow_number| id|dogs|cats|
+----------+----------+---+----+----+
|2018-05-04| 5| 1| 2| 0|
|2018-05-05| 6| 2| 0| 1|
|2018-05-06| 7| 2| 0| 1|
+----------+----------+---+----+----+
Now i just want to shift the data to last working day or next working day if it is on weekend, means i want my dataframe to be like this:
+----------+----------+---+----+----+
| Date|dow_number| id|dogs|cats|
+----------+----------+---+----+----+
|2018-05-04| 5| 1| 2| 0|
|2018-05-04| 5| 2| 0| 1|
|2018-05-04| 5| 2| 0| 1|
+----------+----------+---+----+----+
Thanks in advance.
Upvotes: 1
Views: 1842
Reputation: 5870
Using the dow_number generated, you can apply condition to check and subtract date using date_sub(),
df = df.withColumn('Date1',F.when(df['dow_number'] == 6,F.date_sub(df.Date,1)).when(df['dow_number'] == 7,F.date_sub(df.Date,2)).otherwise(df.Date))
+----------+----------+---+----+----+----------+
| Date|dow_number| id|dogs|cats| Date1|
+----------+----------+---+----+----+----------+
|2018-05-04| 5| 1| 2| 0|2018-05-04|
|2018-05-05| 6| 2| 0| 1|2018-05-04|
|2018-05-06| 7| 2| 0| 1|2018-05-04|
+----------+----------+---+----+----+----------+
I believe, you don't need dow_number to change as well.If you need, either you can use date_format on new date and get it (or) apply another condition as above. Hope this helps!
Upvotes: 2