Reputation: 17676
How can I construct a boolean filter condition dynamically in spark sql? Having:
val d = Seq(1, 2, 3, 5, 6).toDF
d.filter(col("value") === 1 or col("value") === 3).show
How can I replicate this dynamically:
val desiredThings = Seq(1,3)
I try to build the filter:
val myCondition = desiredThings.map(col("value") === _)
d.filter(myCondition).show
but fail with:
overloaded method value filter with alternatives:
org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row]
cannot be applied to (Seq[org.apache.spark.sql.Column])
When executing
d.filter(myCondition).show
Also when experimenting with fold left:
val myCondition = desiredThings.foldLeft()((result, entry) => result && col(c.columnCounterId) === entry)
I have compile errors.
How can I adapt the code to dynamically generate the filter predicate?
Upvotes: 1
Views: 4698
Reputation: 1
Just use isin
:
d.filter(col("value").isin(desiredThings: _*))
but if you really want to foldLeft
you have to provide the base condition:
d.filter(desiredThings.foldLeft(lit(false))(
(acc, x) => (acc || col("value") === (x)))
)
Alternatively, to use with filter
or where, you can generate a SQL expression using:
val filterExpr = desiredThings.map( v => s"value = $v").mkString(" or ")
And then use it like
d.filter(filterExpr).show
// or
d.where(filterExpr).show
//+-----+
//|value|
//+-----+
//| 1|
//| 3|
//+-----+
Upvotes: 5