Sowmya Ravichandran
Sowmya Ravichandran

Reputation: 177

How to append a value from exploded value in dataframe in pyspark

The data is

data = [{"_id":"Inst001","Type":"AAAA", "Model001":[{"_id":"Mod001", "Name": "FFFF"},
                                                    {"_id":"Mod0011", "Name": "FFFF4"}]},
        {"_id":"Inst002", "Type":"BBBB", "Model001":[{"_id":"Mod002", "Name": "DDD"}]}]

Need to frame a dataframe as follows

pid _id Name
Inst001 Mod001 FFFF
Inst001 Mod0011 FFFF4
Inst002 Mod002 DDD

The approach I had is

  1. Need to explode "Model001"
  2. Then need to append the main _id to this exploded dataframe. But how this append can be done in pyspark?

Is there any builtin method available in pyspark for the above problem?

Upvotes: 0

Views: 116

Answers (1)

mck
mck

Reputation: 42392

Create a dataframe with a proper schema, and do inline on the Model001 column:

df = spark.createDataFrame(
    data, 
    '_id string, Type string, Model001 array<struct<_id:string, Name:String>>'
).selectExpr('_id as pid', 'inline(Model001)')

df.show(truncate=False)
+-------+-------+-----+
|pid    |_id    |Name |
+-------+-------+-----+
|Inst001|Mod001 |FFFF |
|Inst001|Mod0011|FFFF4|
|Inst002|Mod002 |DDD  |
+-------+-------+-----+

Upvotes: 1

Related Questions