Reputation: 313
I'm using spark 2.4.5. I want to add two new columns, date & calendar week, in my pyspark data frame df. So I tried the following code:
from pyspark.sql.functions import lit
df.withColumn('timestamp', F.lit('2020-05-01'))
df.show()
But I'm getting error message: AssertionError: col should be Column
Can you explain how to add date column & calendar week?
Upvotes: 0
Views: 2016
Reputation: 351
Looks like you missed the lit
function in your code.
Here's what you were looking for:
df = df.withColumn("date", lit('2020-05-01'))
This is your answer if you want to hardcode the date and week. If you want to programmatically derive the current timestamp, I'd recommend using a UDF.
Upvotes: 1
Reputation: 932
I see two questions here: First, how to cast a string to a date. Second, how to get the week of the year from a date.
You can either simply use cast("date")
or the more specific F.to_date
.
df = df.withColumn("date", F.to_date("timestamp", "yyyy-MM-dd"))
Using format date allows you to format a date column to any desired format. w
is the week of the year. W
would be the week of the month.
df = df.withColumn("week_of_year", F.date_format("date", "w"))
Related Question: pyspark getting weeknumber of month
Upvotes: 1