Reputation: 24366
Having dates in one column, how to create a column containing ISO week date?
ISO week date is composed of year, week number and weekday.
year
function.weekofyear
.dayofweek
cannot do it.Example dataframe:
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
('1977-12-31',),
('1978-01-01',),
('1978-01-02',),
('1978-12-31',),
('1979-01-01',),
('1979-12-30',),
('1979-12-31',),
('1980-01-01',)],
['my_date']
).select(F.col('my_date').cast('date'))
df.show()
#+----------+
#| my_date|
#+----------+
#|1977-12-31|
#|1978-01-01|
#|1978-01-02|
#|1978-12-31|
#|1979-01-01|
#|1979-12-30|
#|1979-12-31|
#|1980-01-01|
#+----------+
Desired result:
+----------+-------------+
| my_date|iso_week_date|
+----------+-------------+
|1977-12-31| 1977-W52-6|
|1978-01-01| 1977-W52-7|
|1978-01-02| 1978-W01-1|
|1978-12-31| 1978-W52-7|
|1979-01-01| 1979-W01-1|
|1979-12-30| 1979-W52-7|
|1979-12-31| 1980-W01-1|
|1980-01-01| 1980-W01-2|
+----------+-------------+
Upvotes: 4
Views: 5483
Reputation: 39
An alernative solution that uses .withColumn()
to concatenate the year, "-W" and the iso week number, where weeks 1-9 are padded with a "0".
import pyspark.sql.functions as f
df.withColumn("iso_week_date", f.concat(f.year(col('my_date')), lit("-W"), f.lpad(f.weekofyear(col('my_date')), 2, "0")))
Upvotes: 0
Reputation: 689
Spark SQL extract
makes this much easier.
iso_year
= F.expr("EXTRACT(YEAROFWEEK FROM my_date)")
iso_weekday
= F.expr("EXTRACT(DAYOFWEEK_ISO FROM my_date)")
So, building off of the other answers with the use of concat_ws
:
import pyspark.sql.functions as F
df.withColumn(
'iso_week_date',
F.concat_ws(
"-",
F.expr("EXTRACT(YEAROFWEEK FROM my_date)"),
F.lpad(F.weekofyear('my_date'), 3, "W0"),
F.expr("EXTRACT(DAYOFWEEK_ISO FROM my_date)")
)
).show()
#+----------+-------------+
#| my_date|iso_week_date|
#+----------+-------------+
#|1977-12-31| 1977-W52-6|
#|1978-01-01| 1977-W52-7|
#|1978-01-02| 1978-W01-1|
#|1978-12-31| 1978-W52-7|
#|1979-01-01| 1979-W01-1|
#|1979-12-30| 1979-W52-7|
#|1979-12-31| 1980-W01-1|
#|1980-01-01| 1980-W01-2|
#+----------+-------------+
Upvotes: 6
Reputation: 32650
Your solution is already nice, maybe you could shorten it by simplifying the calculations:
iso_weekday
= (dayofweek(my_date) + 5)%7 + 1
iso_year
= year(date_add(my_date, 4 - iso_weekday))
Which gives you:
import pyspark.sql.functions as F
df.withColumn(
'iso_week_date',
F.concat_ws(
"-",
F.year(F.expr("date_add(my_date, 4 - (dayofweek(my_date) + 5) % 7 + 1)")),
F.lpad(F.weekofyear('my_date'), 3, "W0"),
(F.dayofweek('my_date') + 5) % 7 + 1
)
).show()
#+----------+-------------+
#| my_date|iso_week_date|
#+----------+-------------+
#|1977-12-31| 1977-W52-6|
#|1978-01-01| 1977-W52-7|
#|1978-01-02| 1978-W01-1|
#|1978-12-31| 1978-W52-7|
#|1979-01-01| 1979-W01-1|
#|1979-12-30| 1979-W52-7|
#|1979-12-31| 1980-W01-1|
#|1980-01-01| 1980-W01-2|
#+----------+-------------+
Upvotes: 4
Reputation: 24366
First, one could create rules for columns for year and weekday. Then, concatenate them using concat_ws
and lpad
.
week_from_prev_year = (F.month('my_date') == 1) & (F.weekofyear('my_date') > 9)
week_from_next_year = (F.month('my_date') == 12) & (F.weekofyear('my_date') == 1)
iso_year = F.when(week_from_prev_year, F.year('my_date') - 1) \
.when(week_from_next_year, F.year('my_date') + 1) \
.otherwise(F.year('my_date'))
iso_weekday = F.when(F.dayofweek('my_date') != 1, F.dayofweek('my_date')-1).otherwise(7)
iso_week_date = F.concat_ws('-', iso_year, F.lpad(F.weekofyear('my_date'), 3, 'W0'), iso_weekday)
df2 = df.withColumn('iso_week_date', iso_week_date)
df2.show()
#+----------+-------------+
#| my_date|iso_week_date|
#+----------+-------------+
#|1977-12-31| 1977-W52-6|
#|1978-01-01| 1977-W52-7|
#|1978-01-02| 1978-W01-1|
#|1978-12-31| 1978-W52-7|
#|1979-01-01| 1979-W01-1|
#|1979-12-30| 1979-W52-7|
#|1979-12-31| 1980-W01-1|
#|1980-01-01| 1980-W01-2|
#+----------+-------------+
Upvotes: 0