Reputation: 71
The schema of the dataframe is given below.
|-- idMap: map (nullable = true)
| |-- key: string
| |-- value: array (valueContainsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- id: string (nullable = true)
| | | |-- linked: boolean (nullable = true)
If there are 3 keys in a row for example, I'm trying to convert this to a new string column of the format key1:id;key2:id;key3:id
where id is part of the element at index 0.
What I've tried is
val expr = new scala.collection.mutable.ListBuffer[org.apache.spark.sql.Column]
keyList.foldLeft(expr)((expr, key) => expr += (lit(key), lit(":"), col("idMap")(key)(0)("id"), lit(";")))
val finalDf = df.withColumn("concatColumn", concat(expr.toList:_*))
It's giving me a null column so I'm assuming this approach is flawed. Any input would be appreciated.
Edit: @mck's answer works. Also using concat_ws in step 3 will work as well.
val finalDf = df.withColumn("concatColumn", concat_ws(expr.toList:_*))
Upvotes: 0
Views: 286
Reputation: 42392
If you have Spark 3, you can use transform_values
to transform the map column to get your desired output.
// sample dataframe
val df = spark.sql("select map('key1', array(struct('id1' id, true linked)), 'key2', array(struct('id2' id, false linked))) idMap")
val df2 = df.withColumn(
"concatColumn",
expr("""
concat_ws(';',
map_values(
transform_values(
idMap,
(k, v) -> concat(k, ':', transform(v, y -> y.id)[0])
)
)
)
""")
)
df2.show(false)
+-----------------------------------------------+-----------------+
|idMap |concatColumn |
+-----------------------------------------------+-----------------+
|[key1 -> [[id1, true]], key2 -> [[id2, false]]]|key1:id1;key2:id2|
+-----------------------------------------------+-----------------+
Upvotes: 1