blessonm
blessonm

Reputation: 71

Concatenate keys with first element in the values array in a MapType column

The schema of the dataframe is given below.

|-- idMap: map (nullable = true)
|    |-- key: string
|    |-- value: array (valueContainsNull = true)
|    |    |-- element: struct (containsNull = true)
|    |    |    |-- id: string (nullable = true)
|    |    |    |-- linked: boolean (nullable = true)

If there are 3 keys in a row for example, I'm trying to convert this to a new string column of the format key1:id;key2:id;key3:id where id is part of the element at index 0.

What I've tried is

  1. Collect the keys to a list
  2. Create a list of columns from the list of keys
val expr = new scala.collection.mutable.ListBuffer[org.apache.spark.sql.Column]
keyList.foldLeft(expr)((expr, key) => expr += (lit(key), lit(":"), col("idMap")(key)(0)("id"), lit(";")))
  1. Add a new column with the list of columns passed to concat
val finalDf = df.withColumn("concatColumn", concat(expr.toList:_*))

It's giving me a null column so I'm assuming this approach is flawed. Any input would be appreciated.

Edit: @mck's answer works. Also using concat_ws in step 3 will work as well.

val finalDf = df.withColumn("concatColumn", concat_ws(expr.toList:_*))

Upvotes: 0

Views: 286

Answers (1)

mck
mck

Reputation: 42392

If you have Spark 3, you can use transform_values to transform the map column to get your desired output.

// sample dataframe
val df = spark.sql("select map('key1', array(struct('id1' id, true linked)), 'key2', array(struct('id2' id, false linked))) idMap")

val df2 = df.withColumn(
    "concatColumn", 
    expr("""
        concat_ws(';', 
            map_values(
                transform_values(
                    idMap, 
                    (k, v) -> concat(k, ':', transform(v, y -> y.id)[0])
                )
            )
        ) 
    """)
)

df2.show(false)
+-----------------------------------------------+-----------------+
|idMap                                          |concatColumn     |
+-----------------------------------------------+-----------------+
|[key1 -> [[id1, true]], key2 -> [[id2, false]]]|key1:id1;key2:id2|
+-----------------------------------------------+-----------------+

Upvotes: 1

Related Questions