Reputation: 429
I have data like this,
and I need output like this
How do I achieve this in Pyspark?
Upvotes: 1
Views: 2088
Reputation: 26676
Use date Functions.
First format date to string using to_date
Next use add_months
to create previous and next. Format this using date_format
and store each in an array
. Combine the two arrays into a struct
column. Explode the struct column. Code and logic below
Data
df=spark.createDataFrame([('1' , 'A' ,'14-02-22' ) ,
('1' , 'B' , '11-03-22' )],
('Id' , 'Status' , 'Date' ))
df.show()
+---+------+--------+
| Id|Status| Date|
+---+------+--------+
| 1| A|14-02-22|
| 1| B|11-03-22|
(df.withColumn('Date',to_date('Date', "dd-MM-yy"))#Coerce string to date
.withColumn('Date1', F.struct(array(date_format(add_months('Date',-1),"MMM yy"), date_format('Date',"MMM yy")).alias('Previous')#Create an array of date and previous day, store in struct as Previous
,array(date_format('Date',"MMM yy"),date_format(add_months('Date',1),"MMM yy")).alias('Next')#Create an array of date and next day, store in struct as Previous
)).select('Id','Status','Date','Date1.*')#Select all required columns exploding Date1 with each struct element as column
).show(truncate=False)
+---+------+----------+----------------+----------------+
|Id |Status|Date |Previous |Next |
+---+------+----------+----------------+----------------+
|1 |A |2022-02-14|[Jan 22, Feb 22]|[Feb 22, Mar 22]|
|1 |B |2022-03-11|[Feb 22, Mar 22]|[Mar 22, Apr 22]|
+---+------+----------+----------------+----------------+
Alternatively use concat_ws as follows
(df.withColumn('Date',to_date('Date', "dd-MM-yy"))#Coerce string to date
.withColumn('Date1', F.struct(concat_ws('-',lit((date_format(add_months('Date',-1),"MMM yy").astype('string'))), lit((date_format('Date',"MMM yy").astype('string')))).alias('Previous')#create string by concat of date with string format of the previous months date
,concat_ws('-',lit((date_format(add_months('Date',1),"MMM yy").astype('string'))), lit((date_format('Date',"MMM yy").astype('string')))).alias('Next')#Create a string by concat date with string format of the next months date
)).select('Id','Status','Date','Date1.*')#Select all required columns exploding Date1 with each struct element as column
).show(truncate=False)
+---+------+----------+-------------+-------------+
|Id |Status|Date |Previous |Next |
+---+------+----------+-------------+-------------+
|1 |A |2022-02-14|Jan 22-Feb 22|Mar 22-Feb 22|
|1 |B |2022-03-11|Feb 22-Mar 22|Apr 22-Mar 22|
+---+------+----------+-------------+-------------+
Upvotes: 1
Reputation: 1256
The essential functions you need for this task are: "date_format" to make date in the desired format and "add_months" to add or subtract to date.
from pyspark.sql import functions as F
date_df = spark.createDataFrame(
[
('A', '02/09/2022'),
('B', '02/07/2022'),],
['name', 'date'])
(
date_df
.withColumn('date', F.to_date('date', 'dd/MM/yyyy'))
.withColumn(
'current_month',
F.date_format(F.col('date'), 'MMM yyyy'))
.withColumn(
'prev_month',
F.date_format(
F.add_months(F.col('date'),1),
'MMM yyyy'))
.withColumn(
'next_month',
F.date_format(
F.add_months(F.col('date'),-1),
'MMM yyyy'))
.withColumn(
'Previous',
F.concat(F.col('prev_month'), F.lit('-'), F.col('current_month')))
.withColumn(
'Next',
F.concat(F.col('current_month'), F.lit('-'), F.col('next_month')))
).show()
+----+----------+-------------+----------+----------+-----------------+-----------------+
|name| date|current_month|prev_month|next_month| Previous| Next|
+----+----------+-------------+----------+----------+-----------------+-----------------+
| A|2022-09-02| Sep 2022| Oct 2022| Aug 2022|Oct 2022-Sep 2022|Sep 2022-Aug 2022|
| B|2022-07-02| Jul 2022| Aug 2022| Jun 2022|Aug 2022-Jul 2022|Jul 2022-Jun 2022|
+----+----------+-------------+----------+----------+-----------------+-----------------+
Upvotes: 1