Pyspark JSON array of objects into columns

Question

Im ingesting JSON files into spark and i have come across an object as below in the nested JSON from the file

"data": {
  "key1" :"v1" 
  "key2" : [
     {"nk1" :"nv1"}, 
     {"nk2" :"nv2" }, 
     {"nk3" :"nv3" } 
  ] 
}

After reading it in spark, it is changing into below format:

"data": {
  "key1" :"v1" 
  "key2" : [
     {"nk1" :"nv1", "nk2" :null, "nk3" :null}, 
     {"nk1" :null, "nk2" :"nv2", "nk3" :null}, 
     {"nk1" :null, "nk2" :null, "nk3" :"nv3"} 
  ] 
}

I need them as columns in the spark dataframe

"key1"	"nk1"	"nk2"	"nk3"
"v1"	"kv1"	"kv2"	"kv3"

Please help me with any solution for this. I'm thinking to convert this to string and use regex. Is there any better solution?

Pyspark JSON array of objects into columns

Answers (1)

Related Questions