Reputation: 507
I have to generate where condition based on case class/dataframe.
For example I will have sample data like below, which I can get from case class/dataframe with 4 columns, these dataframe has lot of data, i have to filter based on the id. for a id I have to generate the whereQuery
columns are (id, col1, col2, col3)
|-------------------------------------------------------|
|id | col1 | col2 | col3 |
|-------------------------------------------------------|
|"1" | "col1vr1" | "col2vr1" | "col3vr1" |
|"1" | "col1vr2" | "col2vr2" | "col3vr2" |
|-------------------------------------------------------|
For above data I have to generate a where clause as below,
( col("col1")<=>col1vr1 && col("col2")<=>col2vr1 && col("col3") <=> col3vr1 ) || ( col("col1")<=>col1vr2 && col("col2")<=>col2vr2 && col("col3") <=> col3vr2 )
so that I can apply the above query to WHEN condtion when( finalColumn, "We don't have any records for this rule" )
//Here finalColumn is generated query
case class test(id: String, col1: String, col2: String, col3: String)
Test data :
val testmap = List(
test("1", "col1v", "col2va", "col3va"),
test("1", "col1v", "col2va", "col3vb"),
test("1", "col1va", "col2va", "col3vc"),
test("1", "col1va", "col2va", "col3vd"),
test("1", "col1vb", "col2vb", "col3vd"),
test("1", "col1vb", "col2vb", "col3ve"),
test("1", "col1vb", "col2va", "col3vd"),
test("1", "col1vb", "col2va", "col3vf"),
test("1", "col1vc", "col2vb", "col3vf"),
test("1", "col1vc", "col2vc", "col3vf"),
test("2", "col1v", "col2va", "col3va"),
test("2", "col1v", "col2va", "col3vb"),
test("2", "col1vb", "col2vb", "col3ve"),
test("2", "col1vb", "col2vb", "col3vd"),
test("2", "col1vc", "col2vc", "col3vf"),
test("3", "col1va", "col2va", "col3va"),
test("3", "col1vb", "col2vb", "col3vb"),
test("3", "col1vc", "col2vc", "col3vc") )
Code Sinippet :
var whereCond = scala.collection.mutable.ArrayBuffer[Column]()
val t1 = testmap.filter( p => p.id.equalsIgnoreCase("1") ) //This will call by iteration, we need rule per iteration
t1.map( rule => {
if ( ! ( rule.col1.equalsIgnoreCase("all") ) ) {
whereCond.+=(col("col1")<=>rule.col1 + " && ")
if ( ! ( rule.col2.equalsIgnoreCase("all") ) ) {
whereCond.+=(col("col2")<=>rule.col2 + " && ")
}
if ( !( rule.col3.equalsIgnoreCase("all") ) ) {
whereCond.+=(col("col3")<=>rule.col3 + " || ")
}
}
})
var finalColumn = col("")
whereCond.toArray[Column].map(c => { finalColumn.+=(c) } )
finalColumn
But not getting expected results
And also, I tried below code snippet too
var columnData = col("")
val df = testmap.toDF.where($"id"<=>"3").distinct
val col1List = df.select("col1").rdd.map(r=> r.getString(0)).collect().toList
val col2List = df.select("col2").rdd.map(r=> r.getString(0)).collect().toList
val col3List = df.select("col3").rdd.map(r=> r.getString(0)).collect().toList
for( i <- 0 to col1List.size - 1 )
if ( columnData == col(""))
columnData = col("col1")<=>col1List(i) && col("col2")<=>col2List(i) && col("col3") <=>col3List(i)
else
columnData = columnData || (col("col1")<=>col1List(i) && col("col2")<=>col2List(i) && col("col3") <=>col3List(i) )
columnData
Whenever we do && or || operation on the col
scala automatically creating brackets for both of them
For above code I am getting output as below
(((((col1 <=> col1vc) AND (col2 <=> col2vc)) AND (col3 <=> col3vc))
OR (((col1 <=> col1va) AND (col2 <=> col2va)) AND (col3 <=> col3va)))
OR (((col1 <=> col1vb) AND (col2 <=> col2vb)) AND (col3 <=> col3vb)))
But I am expecting output as
col1 <=> col1vc AND col2 <=> col2vc AND col3 <=> col3vc
OR (col1 <=> col1va AND col2 <=> col2va AND col3 <=> col3va )
OR (col1 <=> col1vb AND col2 <=> col2vb AND col3 <=> col3vb )
Upvotes: 3
Views: 928
Reputation:
Whenever we do && or || operation on the col scala automatically creating brackets for both of them
That's not Scala. That's plain SQL operator precedence where (quoting an answer by charles-bretana):
And has precedence over Or, so, even if a <=> a1 Or a2
If this behavior is not desired the expression should be parenthesized
scala> import org.apache.spark.sql.functions.col
import org.apache.spark.sql.functions.col
scala> val col1 = col("col1")
col1: org.apache.spark.sql.Column = col1
scala> val col2 = col("col2")
col2: org.apache.spark.sql.Column = col2
scala> val col3 = col("col3")
col3: org.apache.spark.sql.Column = col3
scala> (col1 <=> "col1vc" and col2 <=> "col1vc")
res0: org.apache.spark.sql.Column = ((col1 <=> col1vc) AND (col2 <=> col1vc))
scala> col1 <=> "col1vc" and col2 <=> "col1vc" and col3 <=> "col3vc"
res1: org.apache.spark.sql.Column = (((col1 <=> col1vc) AND (col2 <=> col1vc)) AND (col3 <=> col3vc))
scala> col1 <=> "col1vc" and col2 <=> "col1vc" and (col3 <=> "col3vc" or (col1 <=> "col1va" and col2 <=> "col2va" and col3 <=> "col3va"))
res2: org.apache.spark.sql.Column = (((col1 <=> col1vc) AND (col2 <=> col1vc)) AND ((col3 <=> col3vc) OR (((col1 <=> col1va) AND (col2 <=> col2va)) AND (col3 <=> col3va))))
Upvotes: 2