LWNirvana
LWNirvana

Reputation: 57

Expand JSON from pySpark DataFrame into name / value pairs

I have a pySpark dataframe looking like this:

|id|json                                  |
+--+--------------------------------------+
|1 |{"attr1": "value1"}                   |
|2 |{"attr2": "value2", "attr3": "value3"}|

root
 |-- id: string (nullable = true)
 |-- json: string (nullable = true)

How do I convert it into a new dataframe which will look like this:

|id|attr |value |
+--+-----+------+
|1 |attr1|value1|
|2 |attr2|value2|
|2 |attr3|value3|

(tried to google for the solution with no success, apologies if it's a duplicate) Thanks!

Upvotes: 0

Views: 137

Answers (1)

wwnde
wwnde

Reputation: 26676

Please check schema, looks to me like a map type. if column json is of maptype, use map_entries to extract elements and explode.

df=spark.createDataFrame(Data, schema)

new.withColumn('attri', explode(map_entries('json'))).select('id','attri.*').show()

Upvotes: 1

Related Questions