Reputation: 57
I have a pySpark dataframe looking like this:
|id|json |
+--+--------------------------------------+
|1 |{"attr1": "value1"} |
|2 |{"attr2": "value2", "attr3": "value3"}|
root
|-- id: string (nullable = true)
|-- json: string (nullable = true)
How do I convert it into a new dataframe which will look like this:
|id|attr |value |
+--+-----+------+
|1 |attr1|value1|
|2 |attr2|value2|
|2 |attr3|value3|
(tried to google for the solution with no success, apologies if it's a duplicate) Thanks!
Upvotes: 0
Views: 137
Reputation: 26676
Please check schema, looks to me like a map type. if column json is of maptype, use map_entries to extract elements and explode.
df=spark.createDataFrame(Data, schema)
new.withColumn('attri', explode(map_entries('json'))).select('id','attri.*').show()
Upvotes: 1