Georg Heiler
Georg Heiler

Reputation: 17676

spark sql dynamic filter condition

How can I construct a boolean filter condition dynamically in spark sql? Having:

val d = Seq(1, 2, 3, 5, 6).toDF
d.filter(col("value") === 1 or col("value") === 3).show

How can I replicate this dynamically:

val desiredThings = Seq(1,3)

I try to build the filter:

val myCondition = desiredThings.map(col("value") === _)
d.filter(myCondition).show

but fail with:

overloaded method value filter with alternatives:
org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row]
 cannot be applied to (Seq[org.apache.spark.sql.Column])

When executing

d.filter(myCondition).show

Also when experimenting with fold left:

val myCondition = desiredThings.foldLeft()((result, entry) => result && col(c.columnCounterId) === entry)

I have compile errors.

How can I adapt the code to dynamically generate the filter predicate?

Upvotes: 1

Views: 4698

Answers (1)

user9593805
user9593805

Reputation: 1

Just use isin:

d.filter(col("value").isin(desiredThings: _*))

but if you really want to foldLeft you have to provide the base condition:

d.filter(desiredThings.foldLeft(lit(false))(
  (acc, x) => (acc || col("value") === (x)))
)

Alternatively, to use with filter or where, you can generate a SQL expression using:

val filterExpr = desiredThings.map( v => s"value = $v").mkString(" or ")

And then use it like

d.filter(filterExpr).show
// or
d.where(filterExpr).show

//+-----+
//|value|
//+-----+
//|    1|
//|    3|
//+-----+

Upvotes: 5

Related Questions