Reputation: 631
I am trying to select all record recorded at Sunday through SparkSQL. I have the following try but in vain.
SELECT * FROM mytable WHERE DATEPART(WEEKDAY, create_time) = 0
SELECT * FROM mytable WHERE strftime("%w", create_time) = 0
How to get day of week in SparkSQL?
Upvotes: 22
Views: 58449
Reputation: 19308
Here's an example that shows the output from the different options. Suppose you have this DataFrame.
+----------+
| some_date|
+----------+
|2021-01-10|
|2021-01-11|
|2021-01-12|
|2021-01-13|
|2021-01-14|
|2021-01-15|
|2021-01-16|
| null|
+----------+
Here are the results of using the dayofweek
, date_format
, and dayOfWeekStr
(from spark-daria) functions.
import com.github.mrpowers.spark.daria.sql.functions._
df
.withColumn("dayofweek", dayofweek(col("some_date")))
.withColumn("date_format", date_format(col("some_date"), "EEEE"))
.withColumn("dayOfWeekStr", dayOfWeekStr(col("dayofweek")))
.show()
+----------+---------+-----------+------------+
| some_date|dayofweek|date_format|dayOfWeekStr|
+----------+---------+-----------+------------+
|2021-01-10| 1| Sunday| Sun|
|2021-01-11| 2| Monday| Mon|
|2021-01-12| 3| Tuesday| Tue|
|2021-01-13| 4| Wednesday| Wed|
|2021-01-14| 5| Thursday| Thu|
|2021-01-15| 6| Friday| Fri|
|2021-01-16| 7| Saturday| Sat|
| null| null| null| null|
+----------+---------+-----------+------------+
The daysofweek
output is best for date addition with date_add
or date_sub
, as described in this post.
The date_format
solution is best for customizing the dates for a given format (e.g. for a human readable deliverable).
Some date functions, like next_day
take a day in string form as argument, and that's when dayOfWeekStr
comes in handy.
Upvotes: 8
Reputation: 1240
This works for me:
spark.sql("select dayofweek(time) as dow from some_table")
Where time
needs to be in date format
Upvotes: 4
Reputation: 1551
If the create_time is in the format of UTC, you can use the following to filter out specific days in SparkSQL. I used Spark 1.6.1:
select id, date_format(from_unixtime(created_utc), 'EEEE') from testTable where date_format(from_unixtime(created_utc), 'EEEE') == "Wednesday"
If you specify 'EEEE', the day of the week is spelled out completely. You can use 'E' to specify the shortened version, e.g. Wed. You can find more info here: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
Upvotes: 8
Reputation: 411
SPARK 1.5.0 has a date_format
function that accepts a format as an argument. This format returns a name of a week day from a timestamp:
select date_format(my_timestamp, 'EEEE') from ....
Result: e.g. 'Tuesday'
Upvotes: 41