Wen Hsiao
Wen Hsiao

Reputation: 631

How to get day of week in SparkSQL?

I am trying to select all record recorded at Sunday through SparkSQL. I have the following try but in vain.

SELECT * FROM mytable WHERE DATEPART(WEEKDAY, create_time) = 0
SELECT * FROM mytable WHERE strftime("%w", create_time) = 0

How to get day of week in SparkSQL?

Upvotes: 22

Views: 58449

Answers (4)

Powers
Powers

Reputation: 19308

Here's an example that shows the output from the different options. Suppose you have this DataFrame.

+----------+
| some_date|
+----------+
|2021-01-10|
|2021-01-11|
|2021-01-12|
|2021-01-13|
|2021-01-14|
|2021-01-15|
|2021-01-16|
|      null|
+----------+

Here are the results of using the dayofweek, date_format, and dayOfWeekStr (from spark-daria) functions.

import com.github.mrpowers.spark.daria.sql.functions._

df
  .withColumn("dayofweek", dayofweek(col("some_date")))
  .withColumn("date_format", date_format(col("some_date"), "EEEE"))
  .withColumn("dayOfWeekStr", dayOfWeekStr(col("dayofweek")))
  .show()
+----------+---------+-----------+------------+
| some_date|dayofweek|date_format|dayOfWeekStr|
+----------+---------+-----------+------------+
|2021-01-10|        1|     Sunday|         Sun|
|2021-01-11|        2|     Monday|         Mon|
|2021-01-12|        3|    Tuesday|         Tue|
|2021-01-13|        4|  Wednesday|         Wed|
|2021-01-14|        5|   Thursday|         Thu|
|2021-01-15|        6|     Friday|         Fri|
|2021-01-16|        7|   Saturday|         Sat|
|      null|     null|       null|        null|
+----------+---------+-----------+------------+

The daysofweek output is best for date addition with date_add or date_sub, as described in this post.

The date_format solution is best for customizing the dates for a given format (e.g. for a human readable deliverable).

Some date functions, like next_day take a day in string form as argument, and that's when dayOfWeekStr comes in handy.

Upvotes: 8

MJeremy
MJeremy

Reputation: 1240

This works for me:

spark.sql("select dayofweek(time) as dow from some_table")

Where time needs to be in date format

Upvotes: 4

MrChristine
MrChristine

Reputation: 1551

If the create_time is in the format of UTC, you can use the following to filter out specific days in SparkSQL. I used Spark 1.6.1:

select id,  date_format(from_unixtime(created_utc), 'EEEE') from testTable where date_format(from_unixtime(created_utc), 'EEEE') == "Wednesday"

If you specify 'EEEE', the day of the week is spelled out completely. You can use 'E' to specify the shortened version, e.g. Wed. You can find more info here: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

Upvotes: 8

Olena
Olena

Reputation: 411

SPARK 1.5.0 has a date_format function that accepts a format as an argument. This format returns a name of a week day from a timestamp:

select date_format(my_timestamp, 'EEEE') from ....

Result: e.g. 'Tuesday'

Upvotes: 41

Related Questions