Reputation: 25
I need to add a column having date column which will be the date on Monday of that week. I have a column having week number already.
How can i achieve it using pyspark or spark sql.
Example: week number: 28
Date on monday of week number 28 will be: 2021-07-12
Upvotes: 0
Views: 927
Reputation: 2998
I think you have a date column from which you are getting the week number. You can also get the day number of the week based on the date. Day number start from sunday as 1, monday as 2 till saturday 7 by default based on the dayofweek
function. We will have to apply some custom logic to change this ordering and make monday day number as 1, tuesday as 2 till Sunday as 7.
After that we can subtract day number from the actual date to get the date of the monday for that week.
//sample dataframe creation
df = spark.createDataFrame([(1, "2021-07-11"),(2, "2021-07-15")], ['id', 'date'])
//getting the week number and the day number of the day in the week
from pyspark.sql.functions import *
df1 = df.withColumn("weeknumber",weekofyear(df.date))
.withColumn("day_of_week",dayofweek(df.date))
.withColumn("date_format", date_format(df.date, "EEEE"))
//subtracting the day number from the actual date to get the monday date for that week
df2 = df1.withColumn("Number", when (df1.day_of_week == 1, df1.day_of_week + 6 ).otherwise(df1.day_of_week - 1))
.selectExpr('*', 'date_sub(date, Number - 1) as week_start_monday')
display(df2)
you can see the output as below:
Upvotes: 0
Reputation: 5032
You can use to_date function on your date with 1(day of week: Monday) concatenated, like 202129, where 2021 is year, 29 is week of year, 1 is week day number. Refer to Java Simple Date format for info on date time chars.
Note - Somehow the to_date
is the converting the date to a week prior than the correct week , hence added 7 days to get the actual monday date
Example -
sc = SparkContext.getOrCreate()
sql = SQLContext(sc)
sql.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
sparkDF = sql.createDataFrame([('1','202129')],[ "ID","date_weeknum"])
sparkDF = sparkDF.withColumn("new_date", F.to_date(F.concat("date_weeknum",F.lit("1")), "yyyywwu"))
sparkDF = sparkDF.withColumn("monday_date", F.date_add(F.col('new_date'),7))
sparkDF.show()
+---+------------+----------+-----------+
| ID|date_weeknum| new_date|monday_date|
+---+------------+----------+-----------+
| 1| 202129|2021-07-12| 2021-07-19|
+---+------------+----------+-----------+
sparkDF = sql.createDataFrame([('1','202129')],[ "ID","date_weeknum"])
sparkDF = sparkDF.withColumn("new_date", F.to_date(F.concat("date_weeknum",F.lit("1")), "yyyywwu"))
sparkDF = sparkDF.withColumn("monday_date", F.date_add(F.col('new_date'),7))
sparkDF.show()
+---+------------+----------+-----------+
| ID|date_weeknum| new_date|monday_date|
+---+------------+----------+-----------+
| 1| 202129|2021-07-12| 2021-07-19|
+---+------------+----------+-----------+
Upvotes: 2