Reputation: 518
I have a list of filter conditions for a spark sql query in the form of
Map<String, List<String>> conditions
Lets say that the initial dataset is
Dataset<Row> dataSet
I want to use each entry in the map lets say (StringA, ListA), (StringB, ListB)... and so on, in the filter where each entry will be an OR condition
(col("my_col1").equals("StringA").and(col("my_col2").isInCollection(ListA)))
OR
(col("my_col1").equals("StringB").and(col("my_col2").isInCollection(ListB)))
OR
(col("my_col1").equals("StringC").and(col("my_col2").isInCollection(ListC)))
....
The dataset is a Cassandra table with (my_col1, my_col2) as the partition key.
I don't expect to have more than 5-6 entries in the map
Upvotes: 0
Views: 269
Reputation: 21
use isin
d.filter(col("value").isin(desiredThings: _*))
but if you really want to foldLeft you have to provide the base condition
d.filter(desiredThings.foldLeft(lit(false))(
(acc, x) => (acc || col("value") === (x)))
)
Alternatively, to use with filter or where, you can generate a SQL expression using:
val filterExpr = desiredThings.map( v => s"value = $v").mkString(" or ")
And then use it like
d.filter(filterExpr).show
// or
d.where(filterExpr).show
//+-----+
//|value|
//+-----+
//| 1|
//| 3|
//+-----+
Upvotes: 1