Reputation: 1054
I have a dataframe with 2 string columns, and another one with an array strucuture:
-- music: string (nullable = true)
|-- artist: string (nullable = true)
|-- details: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- Genre: string (nullable = true)
| | |-- Origin: string (nullable = true)
Just to help you, this is a sample data:
music | artist | details
Music_1 | Artist_1 | [{"Genre": "Rock", "Origin": "USA"}]
Music_2 | Artist_3 | [{"Genre": "", "Origin": "USA"}]
Music_3 | Artist_1 | [{"Genre": "Rock", "Origin": "UK"}]
I am trying a simple operation, I guess, just concat the Key and Value by '-'. Basically, what I am trying to do is to get the following strucuture:
music | artist | details
Music_1 | Artist_1 | Genre - Rock, Origin - USA
Music_2 | Artist_3 | Genre - , Origin - USA
Music_3 | Artist_1 | Genre - Rock, Origin - UK
For that I already tried an approach that was sparate first the key and value in different columns to then I can concat the items:
display(df.select(col("music"), col("artist"), posexplode("details").alias("key","value")))
But I got the following result:
music | artist | key | value
Music_1 | Artist_1 | 0 | [{"Genre": "Rock", "Origin": "USA"}]
Music_2 | Artist_3 | 0 | [{"Genre": "", "Origin": "USA"}]
Music_3 | Artist_1 | 0 | [{"Genre": "Rock", "Origin": "UK"}]
Probably is not the best solution, anyone can help me?
Thanks!
Upvotes: 1
Views: 399
Reputation: 5487
You can use built-in higher order function transform()
to get desired result (From spark 2.4).
df = # Input data
df.withColumn('details', expr("transform(details, c-> concat_ws(', ', concat_ws(' - ', 'Genre', c['Genre']),
concat_ws(' - ', 'Origin', c['Origin'])))")) \
.withColumn('details', explode_outer('details')) \
.show(truncate=False)
+--------+--------------------------+-------+
|artist |details |music |
+--------+--------------------------+-------+
|Artist_1|Genre - Rock, Origin - USA|Music_1|
|Artist_3|Genre - , Origin - USA |Music_2|
|Artist_1|Genre - Rock, Origin - UK |Music_3|
+--------+--------------------------+-------+
Upvotes: 2