Reputation: 10153
I have a dataframe df
with the following columns:
ts: Timestamp
val: String
From my master df, I want to select dataframes that only match a certain ts
value. I can achieve that using between
like:
df.filter($"ts".between(targetDate, targetDate))
Here targetDate
is the date I want to filter my df on. Is there an equivalent equal
such as df.filter($"ts".equal(targetDate))
?
Upvotes: 2
Views: 731
Reputation: 330423
As you can see in the Column's documentation, you can use the ===
method to compare column's values with Any
type of variable.
val df = sc.parallelize(
("2016-02-24T22:54:17Z", "foo") ::
("2010-08-01T00:00:12Z", "bar") ::
Nil
).toDF("ts", "val").withColumn("ts", $"ts".cast("timestamp"))
df.where($"ts" === "2010-08-01T00:00:12Z").show(10, false)
// +---------------------+---+
// |ts |val|
// +---------------------+---+
// |2010-08-01 02:00:12.0|bar|
// +---------------------+---+
If you want to be explicit about types you can replace
=== "2010-08-01T00:00:12Z"
with
=== lit("2010-08-01T00:00:12Z").cast("timestamp")
There is also Column.equalTo
method designed for Java interoperability:
df.where($"ts".equalTo("2010-08-01T00:00:12Z")).show(10, false)
Finally Spark supports NULL
safe equality operators (<=>
, Column.eqNullSafe
) but these require Cartesian product in Spark < 1.6 (see SPARK-11111).
Upvotes: 2