user12331
user12331

Reputation: 518

How to pass dynamic list of conditions in spark sql

I have a list of filter conditions for a spark sql query in the form of

Map<String, List<String>> conditions

Lets say that the initial dataset is

Dataset<Row> dataSet

I want to use each entry in the map lets say (StringA, ListA), (StringB, ListB)... and so on, in the filter where each entry will be an OR condition

(col("my_col1").equals("StringA").and(col("my_col2").isInCollection(ListA)))

OR

(col("my_col1").equals("StringB").and(col("my_col2").isInCollection(ListB)))

OR

(col("my_col1").equals("StringC").and(col("my_col2").isInCollection(ListC)))
....

The dataset is a Cassandra table with (my_col1, my_col2) as the partition key.

I don't expect to have more than 5-6 entries in the map

Upvotes: 0

Views: 269

Answers (1)

Aji Tirto Prayogo
Aji Tirto Prayogo

Reputation: 21

use isin

d.filter(col("value").isin(desiredThings: _*))

but if you really want to foldLeft you have to provide the base condition

d.filter(desiredThings.foldLeft(lit(false))(
(acc, x) => (acc || col("value") === (x)))
) 

Alternatively, to use with filter or where, you can generate a SQL expression using:

val filterExpr = desiredThings.map( v => s"value = $v").mkString(" or ")

And then use it like

d.filter(filterExpr).show
// or
d.where(filterExpr).show

//+-----+
//|value|
//+-----+
//|    1|
//|    3|
//+-----+

Upvotes: 1

Related Questions