Jeff Hall
Jeff Hall

Reputation: 21

Turning JSON into dataframe with pandas

I'm trying to get a data frame but keep running into various error messages depending on the arguments I specify in read.json after I specify my file.

I've run through many of the arguments in the pandas.read_json documentation, but haven't been able to identify a solution.

import pandas
json_file = "https://gis.fema.gov/arcgis/rest/services/NSS/OpenShelters/MapServer/0/query?where=1%3D1&outFields=*&returnGeometry=false&outSR=4326&f=json"
pandas.read_json(json_file)

I'm trying to get a data frame but keep running into various error messages depending on the arguments I specify in read.json after I specify my file.

Upvotes: 1

Views: 393

Answers (1)

Code Different
Code Different

Reputation: 93161

Because the JSON is not directly convertible to DataFrame. read_json works only with a few formats defined by the orient parameter. Your JSON doesn't follow any of the allowed formats so you need to manipulate the JSON before converting it to a data frame.

Let's take a high level look at your JSON:

{
    "displayFieldName": ...,
    "fieldAliases": {...},
    "fields": {...},
    "features": [...]
}

I'm gonna fathom a guess and assume the features node is what you want. Let's div deeper into features:

"features": [
    {
        "attributes": {
            "OBJECTID": 1,
            "SHELTER_ID": 223259,
            ...
        }
    },
    {
        "attributes": {
            "OBJECTID": 2,
            "SHELTER_ID": 223331,
            ...
        }
    },
    ...
]

features contains a list of objects, each having an attributes node. The data contained in the attributes node is what you actually want.

Here's the code

import pandas as pd
import json
from urllib.request import urlopen

json_file = "https://gis.fema.gov/arcgis/rest/services/NSS/OpenShelters/MapServer/0/query?where=1%3D1&outFields=*&returnGeometry=false&outSR=4326&f=json"
data = urlopen(json_file).read()
raw_json = json.loads(data)
formatted_json = [feature['attributes'] for feature in raw_json['features']]

formatted_json is now a list of dictionaries containing the data we are after. It is no longer JSON. To create the data frame:

df = pd.DataFrame(formatted_json)

Upvotes: 1

Related Questions