Reputation: 297
Below is my dataframe. Im getting this dataframe by converting the java Json list to dataframe.
+---+--------------------------------------------------------------------------------------------------+
| | Value |
+---+---+----------------------------------------------------------------------------------------------+
| 1| {"Name":"john","type":"On(r) TV: Channel","desc_lang":"en~en~en~en","copyright":"Copyright 2021"}|
|-- --------------------------------------------------------------------------------------------------+
| 2| {"Name":"Dane","type":"On(r) TV: Prgrm","desc_lang":"FR~FR~FR~FR","copyright":"Copyright 2022"} |
+------------------------------------------------------------------------------------------------------+
The required output is as below.
+---+---------------------+---+-------+----------------+
|Name| type | desc_lang | copyright |
+----+--------------------+-----------+-------+--------+
|john| On(r) TV: Channel | en~en~en~en |Copyright 2021|
|Dane| On(r) TV: Prgrm | FR~FR~FR~FR |Copyright 2022|
+----+--------------------+-------------+--------------+
This is only the sample data, I actually have around 180 columns that needs to be flattened to the above tabular format. Below is the code I tried splitting, but it did not given me the desired output.
val dfcollect = DF.withColumn("finalop", split($"Value", ":"))
Could someone please assist on how to achieve this output.
Upvotes: 0
Views: 383
Reputation: 42332
You can use from_json
and star expand the resulting struct:
val df2 = df.select(
from_json(
col("Value"),
schema_of_json(df.select("Value").head().getString(0))
).as("Value")
).select("Value.*")
df2.show
+----+-------------+-----------+---------------+
|Name| copyright| desc_lang| type|
+----+-------------+-----------+---------------+
|john|Copyright2021|en~en~en~en|On(r)TV:Channel|
|Dane|Copyright2022|FR~FR~FR~FR| On(r)TV:Prgrm|
+----+-------------+-----------+---------------+
Upvotes: 1