Reputation: 209
I am trying to add one column in my existing Pyspark Dataframe using withColumn method.I want to insert current date in this column.From my Source I don't have any date column so i am adding this current date column in my dataframe and saving this dataframe in my table so later for tracking purpose i can use this current date column. I am using below code
df2=df.withColumn("Curr_date",datetime.now().strftime('%Y-%m-%d'))
here df is my existing Dataframe and i want to save df2 as table with Curr_date column. but here its expecting existing column or lit method instead of datetime.now().strftime('%Y-%m-%d'). someone please guide me how should i add this Date column in my dataframe.?
Upvotes: 4
Views: 36269
Reputation: 15258
use either lit
or current_date
from pyspark.sql import functions as F
df2 = df.withColumn("Curr_date", F.lit(datetime.now().strftime("%Y-%m-%d")))
# OR
df2 = df.withColumn("Curr_date", F.current_date())
Upvotes: 10
Reputation: 3173
current_timestamp()
is good but it is evaluated during the serialization time.
If you prefer to use the timestamp
of the processing time of a row, then you may use the below method,
withColumn('current', expr("reflect('java.time.LocalDateTime', 'now')"))
Upvotes: 4
Reputation: 13541
There is a spark function current_timestamp()
.
from pyspark.sql.functions import *
df.withColumn('current', date_format(current_timestamp(), 'yyyy-MM-dd')).show()
+----+----------+
|test| current|
+----+----------+
|test|2020-09-09|
+----+----------+
Upvotes: 5