SanjanaSanju
SanjanaSanju

Reputation: 297

Flattening the array of a dataframe column into separate columns and corresponding values in Spark scala

Below is my dataframe. Im getting this dataframe by converting the java Json list to dataframe.

+---+--------------------------------------------------------------------------------------------------+
|   |                              Value                                                              |
+---+---+----------------------------------------------------------------------------------------------+
|  1| {"Name":"john","type":"On(r) TV: Channel","desc_lang":"en~en~en~en","copyright":"Copyright 2021"}|
|-- --------------------------------------------------------------------------------------------------+
|  2| {"Name":"Dane","type":"On(r) TV: Prgrm","desc_lang":"FR~FR~FR~FR","copyright":"Copyright 2022"}  |
+------------------------------------------------------------------------------------------------------+

The required output is as below.

+---+---------------------+---+-------+----------------+
|Name|    type            | desc_lang | copyright      |
+----+--------------------+-----------+-------+--------+
|john| On(r) TV: Channel  | en~en~en~en |Copyright 2021|
|Dane| On(r) TV: Prgrm    | FR~FR~FR~FR |Copyright 2022|
+----+--------------------+-------------+--------------+

This is only the sample data, I actually have around 180 columns that needs to be flattened to the above tabular format. Below is the code I tried splitting, but it did not given me the desired output.

val dfcollect = DF.withColumn("finalop", split($"Value", ":"))

Could someone please assist on how to achieve this output.

Upvotes: 0

Views: 383

Answers (1)

mck
mck

Reputation: 42332

You can use from_json and star expand the resulting struct:

val df2 = df.select(
    from_json(
        col("Value"), 
        schema_of_json(df.select("Value").head().getString(0))
    ).as("Value")
).select("Value.*")

df2.show
+----+-------------+-----------+---------------+
|Name|    copyright|  desc_lang|           type|
+----+-------------+-----------+---------------+
|john|Copyright2021|en~en~en~en|On(r)TV:Channel|
|Dane|Copyright2022|FR~FR~FR~FR|  On(r)TV:Prgrm|
+----+-------------+-----------+---------------+

Upvotes: 1

Related Questions