Reputation: 465
I'm working with dataframe
root
|-- c: long (nullable = true)
|-- data: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = true)
| | |-- value: string (nullable = true)
I'm trying to filter this dataframe based on an element ["value1", "key1"] in the array data i.e if this element exist in data of df so keep it else delete it, I tried
df.filter(col("data").contain("["value1", "key1"])
but it didn't work. Also I tried to
put val f=Array("value1", "key1")
then df.filter(col("data").contain(f))
it didn't work also.
Any help please?
Upvotes: 0
Views: 134
Reputation: 41957
Straight forward approach would be to use a udf
function as udf
function helps to perform logics row by row and in primitive datatypes (thats what your requirement suggests to check every key and value of struct
element in array data column)
import org.apache.spark.sql.functions._
//udf to check for key1 in key and value1 in value of every struct in the array field
def containsUdf = udf((data: Seq[Row])=> data.exists(row => row.getAs[String]("key") == "key1" && row.getAs[String]("value") == "value1"))
//calling the udf function in the filter
val filteredDF = df.filter(containsUdf(col("data")))
so the filteredDF
should be your desired output
Upvotes: 4