sparc
sparc

Reputation: 429

Previous and next month, year based on date in Pyspark

I have data like this,

enter image description here

and I need output like this

enter image description here

How do I achieve this in Pyspark?

Upvotes: 1

Views: 2088

Answers (2)

wwnde
wwnde

Reputation: 26676

Use date Functions. First format date to string using to_date

Next use add_months to create previous and next. Format this using date_format and store each in an array. Combine the two arrays into a struct column. Explode the struct column. Code and logic below

Data

 df=spark.createDataFrame([('1'    , 'A' ,'14-02-22' ) ,
 ('1'    , 'B'    , '11-03-22' )],
  ('Id' , 'Status' , 'Date' ))
  
df.show()

+---+------+--------+
| Id|Status|    Date|
+---+------+--------+
|  1|     A|14-02-22|
|  1|     B|11-03-22|



(df.withColumn('Date',to_date('Date', "dd-MM-yy"))#Coerce string to date
 .withColumn('Date1', F.struct(array(date_format(add_months('Date',-1),"MMM yy"), date_format('Date',"MMM yy")).alias('Previous')#Create an array of date and previous day, store in struct as Previous
  ,array(date_format('Date',"MMM yy"),date_format(add_months('Date',1),"MMM yy")).alias('Next')#Create an array of date and next day, store in struct as Previous
  )).select('Id','Status','Date','Date1.*')#Select all required columns exploding Date1 with each struct element as column
).show(truncate=False)


+---+------+----------+----------------+----------------+
|Id |Status|Date      |Previous        |Next            |
+---+------+----------+----------------+----------------+
|1  |A     |2022-02-14|[Jan 22, Feb 22]|[Feb 22, Mar 22]|
|1  |B     |2022-03-11|[Feb 22, Mar 22]|[Mar 22, Apr 22]|
+---+------+----------+----------------+----------------+

Alternatively use concat_ws as follows

(df.withColumn('Date',to_date('Date', "dd-MM-yy"))#Coerce string to date
 .withColumn('Date1', F.struct(concat_ws('-',lit((date_format(add_months('Date',-1),"MMM yy").astype('string'))), lit((date_format('Date',"MMM yy").astype('string')))).alias('Previous')#create string by concat of date with string format of the previous months date
  ,concat_ws('-',lit((date_format(add_months('Date',1),"MMM yy").astype('string'))), lit((date_format('Date',"MMM yy").astype('string')))).alias('Next')#Create a string by concat date with string format of the next months date
  )).select('Id','Status','Date','Date1.*')#Select all required columns exploding Date1 with each struct element as column
).show(truncate=False)

+---+------+----------+-------------+-------------+
|Id |Status|Date      |Previous     |Next         |
+---+------+----------+-------------+-------------+
|1  |A     |2022-02-14|Jan 22-Feb 22|Mar 22-Feb 22|
|1  |B     |2022-03-11|Feb 22-Mar 22|Apr 22-Mar 22|
+---+------+----------+-------------+-------------+

Upvotes: 1

Amir Hossein Shahdaei
Amir Hossein Shahdaei

Reputation: 1256

The essential functions you need for this task are: "date_format" to make date in the desired format and "add_months" to add or subtract to date.

from pyspark.sql import functions as F

date_df = spark.createDataFrame(
    [
        ('A', '02/09/2022'),
        ('B', '02/07/2022'),],
    ['name', 'date'])

(
    date_df
    .withColumn('date', F.to_date('date', 'dd/MM/yyyy'))
    .withColumn(
        'current_month', 
        F.date_format(F.col('date'), 'MMM yyyy'))
    .withColumn(
        'prev_month',
        F.date_format(
            F.add_months(F.col('date'),1), 
            'MMM yyyy'))
    .withColumn(
        'next_month', 
        F.date_format(
            F.add_months(F.col('date'),-1), 
            'MMM yyyy'))
    .withColumn(
        'Previous',
        F.concat(F.col('prev_month'), F.lit('-'), F.col('current_month')))
    .withColumn(
        'Next',
        F.concat(F.col('current_month'), F.lit('-'), F.col('next_month')))

).show()

+----+----------+-------------+----------+----------+-----------------+-----------------+
|name|      date|current_month|prev_month|next_month|         Previous|             Next|
+----+----------+-------------+----------+----------+-----------------+-----------------+
|   A|2022-09-02|     Sep 2022|  Oct 2022|  Aug 2022|Oct 2022-Sep 2022|Sep 2022-Aug 2022|
|   B|2022-07-02|     Jul 2022|  Aug 2022|  Jun 2022|Aug 2022-Jul 2022|Jul 2022-Jun 2022|
+----+----------+-------------+----------+----------+-----------------+-----------------+

Upvotes: 1

Related Questions