ZygD
ZygD

Reputation: 24356

Start of the week on Monday in Spark

This is my dataset:

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([('2021-02-07',),('2021-02-08',)], ['date']) \
    .select(
        F.col('date').cast('date'),
        F.date_format('date', 'EEEE').alias('weekday'),
        F.dayofweek('date').alias('weekday_number')
    )
df.show()
#+----------+-------+--------------+
#|      date|weekday|weekday_number|
#+----------+-------+--------------+
#|2021-02-07| Sunday|             1|
#|2021-02-08| Monday|             2|
#+----------+-------+--------------+

dayofweek returns weekday numbers which start on Sunday.

Desired result:

+----------+-------+--------------+
|      date|weekday|weekday_number|
+----------+-------+--------------+
|2021-02-07| Sunday|             7|
|2021-02-08| Monday|             1|
+----------+-------+--------------+

Upvotes: 2

Views: 5051

Answers (3)

Henk
Henk

Reputation: 53

There is no parameter option with the dayofweek function. What you can do is the following to have the dayofweek start at a monday:

.withColumn("DayOfWeekNrMon", fx.dayofweek('Datetmp').cast('Integer')-1)

And then correct the Sunday(which is now 0): .withColumn('DayOfWeekNrMon',fx.when(fx.col('DayOfWeekNrMon')==0,7).otherwise(fx.col('DayOfWeekNrMon')))

Upvotes: 0

ZygD
ZygD

Reputation: 24356

F.expr('weekday(date) + 1')

weekday

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([('2021-02-07',),('2021-02-08',)], ['date']) \
    .select(
        F.col('date').cast('date'),
        F.date_format('date', 'EEEE').alias('weekday'),
        F.expr('weekday(date) + 1').alias('weekday_number'),
    )
df.show()
#+----------+-------+--------------+
#|      date|weekday|weekday_number|
#+----------+-------+--------------+
#|2021-02-07| Sunday|             7|
#|2021-02-08| Monday|             1|
#+----------+-------+--------------+

Upvotes: 0

Christophe
Christophe

Reputation: 696

You can try this :

date_format(col("date"), "u")).alias('weekday_number')

For some reason, it's not in the Spark's documentation of datetime patterns for formatting

You also might need to add this configuration line:
spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')

Thanks for your feedback and very happy to help =)

Upvotes: 2

Related Questions