Alex Woolford
Alex Woolford

Reputation: 4563

JSON to Pandas: is there a more elegant solution?

I have some JSON, returned from an API call, that looks something like this:

{
    "result": {
        "code": "OK",
        "msg": ""
    },
    "report_name": "FAMOUS_DICTATORS",
    "columns": [
        "rank",
        "name",
        "deaths"
    ],
    "data": [
        {
            "row": [
                1,
                "Mao Zedong",
                63000000
            ]
        },
        {
            "row": [
                2,
                "Jozef Stalin",
                23000000
            ]
        }
    ]
}

I'd like to convert the JSON into a Pandas DataFrame:

rank    name            deaths
1       Mao Zedong      63000000
2       Jozef Stalin    23000000

I wrote this and it works, but looks a bit ugly:

import pandas as pd
import json

columns = eval(r.content)['columns']

df = pd.DataFrame(columns = eval(r.content)['columns'])

for row in eval(r.content)['data']:
    df.loc[len(df)+1] = row['row']

Is there a more elegant/Pythonic way to do this (e.g. possibly using pandas.io.json.read_json)?

Upvotes: 0

Views: 513

Answers (2)

WGS
WGS

Reputation: 14169

The read_json function of pandas is a very tricky method to use. If you don't know with certainty the validity of your JSON object or whether its initial structure is sane enough to build a dataframe around, it's much better to stick to tried and tested methods to break your data down to something that pandas can use without issues 100%.

In your case, I suggest breaking down your data to a list of lists. Out of all that JSON, the only part you really need is in the data and column keys.

Try this:

import pandas as pd
import json
import urllib

js = json.loads(urllib.urlopen("test.json").read())
data = js["data"]
rows = [row["row"] for row in data] # Transform the 'row' keys to list of lists.
df = pd.DataFrame(rows, columns=js["columns"])
print df

This gives me the desired result:

   rank          name    deaths
0     1    Mao Zedong  63000000
1     2  Jozef Stalin  23000000

Upvotes: 2

dartdog
dartdog

Reputation: 10862

see pandas.io.json.read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.json.read_json.html

Upvotes: 0

Related Questions