newinPython
newinPython

Reputation: 313

Adding date & calendar week column in py spark dataframe

I'm using spark 2.4.5. I want to add two new columns, date & calendar week, in my pyspark data frame df. So I tried the following code:

from pyspark.sql.functions import lit
df.withColumn('timestamp', F.lit('2020-05-01'))
df.show()

But I'm getting error message: AssertionError: col should be Column

Can you explain how to add date column & calendar week?

Upvotes: 0

Views: 2016

Answers (2)

Looks like you missed the lit function in your code. Here's what you were looking for:

df = df.withColumn("date", lit('2020-05-01'))

This is your answer if you want to hardcode the date and week. If you want to programmatically derive the current timestamp, I'd recommend using a UDF.

Upvotes: 1

mpSchrader
mpSchrader

Reputation: 932

I see two questions here: First, how to cast a string to a date. Second, how to get the week of the year from a date.

Cast string to date

You can either simply use cast("date") or the more specific F.to_date.

df = df.withColumn("date", F.to_date("timestamp", "yyyy-MM-dd"))

Extract week of year

Using format date allows you to format a date column to any desired format. w is the week of the year. W would be the week of the month.

df = df.withColumn("week_of_year", F.date_format("date", "w"))

Related Question: pyspark getting weeknumber of month

Upvotes: 1

Related Questions