Reputation: 93
A panda newbie here that's struggling to understand why I'm unable to completely flatten a JSON I receive from an API. I need a Dataframe with all the data that is returned by the API, however I need all nested data to be expanded and given it's own columns for me to be able to use it.
The JSON I receive is as follows:
[
{
"query":{
"id":"1596487766859-3594dfce3973bc19",
"name":"test"
},
"webPage":{
"inLanguages":[
{
"code":"en"
}
]
},
"product":{
"name":"Test",
"description":"Test2",
"mainImage":"image1.jpg",
"images":[
"image2.jpg",
"image3.jpg"
],
"offers":[
{
"price":"45.0",
"currency":"€"
}
],
"probability":0.9552192
}
}
]
Running pd.json_normalize(data)
without any additional parameters shows the nested values price and currency in the product.offers column. When I try to separate these out into their own columns with the following:
pd.json_normalize(data,record_path=['product',meta['product',['offers']]])
I end up with the following error:
f"{js} has non list value {result} for path {spec}. "
Any help would be much appreciated.
Upvotes: 0
Views: 3254
Reputation: 31226
I've used this technique a few times
pd.json_normalize()
to discover the columnsmeta
parameter by inspecting this and the original JSON
. NB possible index out of range herelist
drives record_path
paramproduct/images
is a list
so it gets named 0
. rename it
data = [{'query': {'id': '1596487766859-3594dfce3973bc19', 'name': 'test'},
'webPage': {'inLanguages': [{'code': 'en'}]},
'product': {'name': 'Test',
'description': 'Test2',
'mainImage': 'image1.jpg',
'images': ['image2.jpg', 'image3.jpg'],
'offers': [{'price': '45.0', 'currency': '€'}],
'probability': 0.9552192}}]
# build default to get column names
df = pd.json_normalize(data)
# from column names build the list that gets sent to meta param
mymeta = [[s for s in c.split(".")] for c in df.columns ]
# exclude lists from meta - this will fail
mymeta = [l for l in mymeta if not isinstance(data[0][l[0]][l[1]], list)]
# you can build df from either of the product lists NOT both
df1 = pd.json_normalize(data, record_path=[["product","offers"]], meta=mymeta)
df2 = pd.json_normalize(data, record_path=[["product","images"]], meta=mymeta).rename(columns={0:"image"})
# want them together - you can merge them. note columns heavily overlap so remove most columns from df2
df1.assign(foo=1).merge(
df2.assign(foo=1).drop(columns=[c for c in df2.columns if c!="image"]), on="foo").drop(columns="foo")
Upvotes: 1