Reputation: 71
I'm using pyspark 2.1 and i have a dataframe with two columns with date format like this:
Column A , START_DT , END_DT
1 , 2016-01-01 , 2020-02-04
16 , 2017-02-23 , 2017-12-24
I want to filter for a certain date (for example 2018-12-31
) between the date from START_DT
and END_DT
(in the example there, the second row would be filtered).
Both START_DT
and END_DT
columns are already in date format, i was looking for a method like the sql:
SELECT *
FROM MYTABLE WHERE '2018-12-31' BETWEEN start_dt AND end_dt
Upvotes: 5
Views: 25885
Reputation: 43504
If you have a working SQL query, you can always register your DataFrame as a temp table and use spark.sql()
:
df.createOrReplaceTempView("MYTABLE")
spark.sql("SELECT * FROM MYTABLE WHERE '2018-12-31' BETWEEN start_dt AND end_dt").show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+
Another option is to pass an expression to where
:
df.where("'2018-12-31' BETWEEN start_dt AND end_dt").show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+
One more way is to use pyspark.sql.Column.between
with pyspark.sql.functions.lit
, but you'll have to use pyspark.sql.functions.expr
in order to use a column value as a parameter.
from pyspark.sql.functions import lit, expr
test_date = "2018-12-31"
df.where(lit(test_date).between(expr('start_dt'), expr('end_dt'))).show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+
Lastly, you can implement your own version of between
:
from pyspark.sql.functions import col
df.where((col("start_dt") <= lit(test_date)) & (col("end_dt") >= lit(test_date))).show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+
Upvotes: 10