P. Prunesquallor
P. Prunesquallor

Reputation: 571

python pandas - TypeError when parsing JSON: string indices must be integers

The records in the JSON file look like this (please note what "nutrients" looks like):

{
"id": 21441,
"description": "KENTUCKY FRIED CHICKEN, Fried Chicken, EXTRA CRISPY,
Wing, meat and skin with breading",
"tags": ["KFC"],
"manufacturer": "Kentucky Fried Chicken",
"group": "Fast Foods",
"portions": [
{
"amount": 1,
"unit": "wing, with skin",
"grams": 68.0
},
...
],
"nutrients": [
{
"value": 20.8,
"units": "g",
"description": "Protein",
"group": "Composition"
},
{'description': 'Total lipid (fat)',
'group': 'Composition',
'units': 'g',
'value': 29.2}
...
]
}

The following is the code from the book exercise*. It includes some wrangling and assembles the nutrients for each food into a single large table:

import pandas as pd
import json

db = pd.read_json("foods-2011-10-03.json")

nutrients = []

for rec in db:
     fnuts = pd.DataFrame(rec["nutrients"])
     fnuts["id"] = rec["id"]
     nutrients.append(fnuts)

However, I get the following error and I can't figure out why:


TypeError                                 Traceback (most recent call last)
<ipython-input-23-ac63a09efd73> in <module>()
      1 for rec in db:
----> 2     fnuts = pd.DataFrame(rec["nutrients"])
      3     fnuts["id"] = rec["id"]
      4     nutrients.append(fnuts)
      5

TypeError: string indices must be integers

*This is an example from the book Python for Data Analysis

Upvotes: 0

Views: 2246

Answers (3)

P. Prunesquallor
P. Prunesquallor

Reputation: 571

Amadan answered the question, but I managed to solve it like this prior to seeing his answer:

for i in range(len(db)):
    rec = db.loc[i]
    fnuts = pd.DataFrame(rec["nutrients"])
    fnuts["id"] = rec["id"]
    nutrients.append(fnuts)

Upvotes: 0

Amadan
Amadan

Reputation: 198314

for rec in db iterates over column names. To iterate over rows,

for id, rec in db.iterrows():
    fnuts = pd.DataFrame(rec["nutrients"])
    fnuts["id"] = rec["id"]
    nutrients.append(fnuts)

This is a bit slow though (all the dicts that need constructing). itertuples is faster; but since you only care about two series, iterating over series directly is probably fastest:

for id, value in zip(db['id'], db['nutrients']):
    fnuts = pd.DataFrame(value)
    fnuts["id"] = id
    nutrients.append(fnuts)

Upvotes: 1

zipa
zipa

Reputation: 27869

The code works perfectly fine but the json should look something like this for code to work:

[{
"id": 21441,
"description": "KENTUCKY FRIED CHICKEN, Fried Chicken, EXTRA CRISPY,Wing, meat and skin with breading",
"tags": ["KFC"],
"manufacturer": "Kentucky Fried Chicken",
"group": "Fast Foods",
"portions": [
{"amount": 1,
"unit": "wing, with skin",
"grams": 68.0}],
"nutrients": [{
"value": 20.8,
"units": "g",
"description": "Protein",
"group": "Composition"
},
{'description': 'Total lipid (fat)',
'group': 'Composition',
'units': 'g',
'value': 29.2}]}]

This is example with one record only.

Upvotes: 0

Related Questions