Driss NEJJAR
Driss NEJJAR

Reputation: 978

Is there a way to get keys from a map based on a column condition using spark dataframe?

I have the following dataframe:

val df = Seq(
    (Map("a" -> "1", "b" -> "2", "c" -> "3"), Seq("a","b"))
    ).toDF("internalMap","commonList")
df.show()

+------------------------+----------+
|internalMap             |commonList|
+------------------------+----------+
|[a -> 1, b -> 2, c -> 3]|[a, b]    |
+------------------------+----------+

How can I get interalMap values, where the array of keys equals array of valueList ?

I have tried to use:

val getMapElements = df.select(map_keys(col("internalMap")).as("internalMapKeys"), map_values(col("internalMap")).as("internalMapValues"))
    
getMapElements.show()

+---------------+-----------------+
|internalMapKeys|internalMapValues|
+---------------+-----------------+
|      [a, b, c]|        [1, 2, 3]|
+---------------+-----------------+

    
getMapElements.select("internalMapValues").where(col("commonList") isin col("internalMapKeys")).show()

+-----------------+
|internalMapValues|
+-----------------+
+-----------------+

But it returns an empty array. What is expected is:

+-----------------+
|internalMapValues|
+-----------------+
|           [1, 2]|
+-----------------+

The most difficult thing, is that only spark dataframe functions are accepted in my use case.

Thank you in advance for your help

Upvotes: 0

Views: 666

Answers (1)

s.polam
s.polam

Reputation: 10382

Use expr, transform functions to get expected result. Check below code.

scala> df
.withColumn(
    "internalMapValues",
    expr("transform(commonList,v -> internalMap[v])") // Pass commonList value to map as key.
)
.show(false)
+------------------------+----------+-----------------+
|internalMap             |commonList|internalMapValues|
+------------------------+----------+-----------------+
|[a -> 1, b -> 2, c -> 3]|[a, b]    |[1, 2]           |
+------------------------+----------+-----------------+

Upvotes: 1

Related Questions