Neel
Neel

Reputation: 10153

Spark 1.5.2: Filtering a dataframe in Scala

I have a dataframe df with the following columns:

ts: Timestamp
val: String

From my master df, I want to select dataframes that only match a certain ts value. I can achieve that using between like: df.filter($"ts".between(targetDate, targetDate)) Here targetDate is the date I want to filter my df on. Is there an equivalent equal such as df.filter($"ts".equal(targetDate)) ?

Upvotes: 2

Views: 731

Answers (1)

zero323
zero323

Reputation: 330423

As you can see in the Column's documentation, you can use the === method to compare column's values with Any type of variable.

=== Method

val df = sc.parallelize(
  ("2016-02-24T22:54:17Z", "foo") :: 
  ("2010-08-01T00:00:12Z", "bar") ::
  Nil
).toDF("ts", "val").withColumn("ts", $"ts".cast("timestamp"))

df.where($"ts" === "2010-08-01T00:00:12Z").show(10, false)
// +---------------------+---+
// |ts                   |val|
// +---------------------+---+
// |2010-08-01 02:00:12.0|bar|
// +---------------------+---+

If you want to be explicit about types you can replace

 === "2010-08-01T00:00:12Z"

with

=== lit("2010-08-01T00:00:12Z").cast("timestamp")

There is also Column.equalTo method designed for Java interoperability:

 df.where($"ts".equalTo("2010-08-01T00:00:12Z")).show(10, false)

Finally Spark supports NULL safe equality operators (<=>, Column.eqNullSafe) but these require Cartesian product in Spark < 1.6 (see SPARK-11111).

Upvotes: 2

Related Questions