Thales Vilela
Thales Vilela

Reputation: 41

PySpark add column before/after date

I have a PySpark dataframe that looks like this:

         Date  Sales      Type
0  2020-01-01     10    hotdog
1  2020-01-01      5  icecream
2  2020-01-01      9      soda
3  2020-01-02      7    hotdog
4  2020-01-02      5  icecream
..        ...    ...       ...
89 2020-01-30      4  icecream
90 2020-01-30     11      soda
91 2020-01-31      7    hotdog
92 2020-01-31      3  icecream
93 2020-01-31     12      soda

I need to add a column that indicates if the row date is before or after 2020-01-15

In pandas, I can do

df['Before-After'] = df['Date'] < '2020-01-15'

How to do that in a PySpark dataframe?

Upvotes: 0

Views: 242

Answers (1)

notNull
notNull

Reputation: 31470

Use when+otherwise statement for this case.

Exmaple:

from pyspark.sql.functions import *
df=spark.createDataFrame([('2020-01-01',),('2020-01-02',),('2020-01-16',),('2020-01-31',)],['Date']).withColumn("Date",col("Date").cast("date"))

df.withColumn("Before-After",when(col("Date") < "2020-01-15",True).otherwise(False)).show()

#sql
df.createOrReplaceTempView("tmp")
spark.sql("select Date, case when Date < '2020-01-15' then True else False end as `Before-After` from tmp").show()
#+----------+------------+
#|      Date|Before-After|
#+----------+------------+
#|2020-01-01|        true|
#|2020-01-02|        true|
#|2020-01-16|       false|
#|2020-01-31|       false|
#+----------+------------+

Upvotes: 2

Related Questions