How to append a value from exploded value in dataframe in pyspark

Question

The data is

data = [{"_id":"Inst001","Type":"AAAA", "Model001":[{"_id":"Mod001", "Name": "FFFF"},
                                                    {"_id":"Mod0011", "Name": "FFFF4"}]},
        {"_id":"Inst002", "Type":"BBBB", "Model001":[{"_id":"Mod002", "Name": "DDD"}]}]

Need to frame a dataframe as follows

pid	_id	Name
Inst001	Mod001	FFFF
Inst001	Mod0011	FFFF4
Inst002	Mod002	DDD

The approach I had is

Need to explode "Model001"
Then need to append the main _id to this exploded dataframe. But how this append can be done in pyspark?

Is there any builtin method available in pyspark for the above problem?

mck · Accepted Answer

Create a dataframe with a proper schema, and do inline on the Model001 column:

df = spark.createDataFrame(
    data, 
    '_id string, Type string, Model001 array>'
).selectExpr('_id as pid', 'inline(Model001)')

df.show(truncate=False)
+-------+-------+-----+
|pid    |_id    |Name |
+-------+-------+-----+
|Inst001|Mod001 |FFFF |
|Inst001|Mod0011|FFFF4|
|Inst002|Mod002 |DDD  |
+-------+-------+-----+

How to append a value from exploded value in dataframe in pyspark

Answers (1)

Related Questions