Rahul Patidar
Rahul Patidar

Reputation: 209

How to add Extra column with current date in Spark dataframe

I am trying to add one column in my existing Pyspark Dataframe using withColumn method.I want to insert current date in this column.From my Source I don't have any date column so i am adding this current date column in my dataframe and saving this dataframe in my table so later for tracking purpose i can use this current date column. I am using below code

    df2=df.withColumn("Curr_date",datetime.now().strftime('%Y-%m-%d'))

here df is my existing Dataframe and i want to save df2 as table with Curr_date column. but here its expecting existing column or lit method instead of datetime.now().strftime('%Y-%m-%d'). someone please guide me how should i add this Date column in my dataframe.?

Upvotes: 4

Views: 36269

Answers (3)

Steven
Steven

Reputation: 15258

use either lit or current_date

from pyspark.sql import functions as F

df2 = df.withColumn("Curr_date", F.lit(datetime.now().strftime("%Y-%m-%d")))

# OR

df2 = df.withColumn("Curr_date", F.current_date())

Upvotes: 10

suresiva
suresiva

Reputation: 3173

current_timestamp() is good but it is evaluated during the serialization time.

If you prefer to use the timestamp of the processing time of a row, then you may use the below method,

withColumn('current', expr("reflect('java.time.LocalDateTime', 'now')"))

Upvotes: 4

Lamanus
Lamanus

Reputation: 13541

There is a spark function current_timestamp().

from pyspark.sql.functions import *

df.withColumn('current', date_format(current_timestamp(), 'yyyy-MM-dd')).show()

+----+----------+
|test|   current|
+----+----------+
|test|2020-09-09|
+----+----------+

Upvotes: 5

Related Questions