mk2080
mk2080

Reputation: 922

pd.read_json() to dataframe in pandas

I have never worked with json files before. Currently this is what I am doing:

df = pd.read_json("precincts-with-results.geojson.gz")
df['features']

This is the result from above:

{'type': 'Feature',
 'properties': {'GEOID': '05047-1-A (Oz Wd 1)',
  'votes_dem': 79,
  'votes_rep': 279,
  'votes_total': 366,
  'votes_per_sqkm': 54.2,
  'pct_dem_lead': -54.6},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[-93.88536364311612, 35.483758439321655],
     [-93.8840470388143, 35.483727092097084],
     [-93.88403177163875, 35.483726728784056],
     [-93.88403177478405, 35.48372661335151],
     [-93.87956152062023, 35.483586322546344],
     [-93.87520339804045, 35.48339873745174],
     [-93.87534656033012, 35.480428139370346],
     [-93.87604589142236, 35.48045051399295], ...

I want a data frame that would look like this:

GEOID                   votes_dem     votes_rep     votes_total      votes_per_sqkm       pct_dem_lead
05047-1-A (Oz Wd 1)       79               279         366             54.2                 -54.6 
  

You can download the dataset here (size 264 mb): https://int.nyt.com/newsgraphics/elections/map-data/2020/national/precincts-with-results.geojson.gz

Help and code appreciated !!

Upvotes: 3

Views: 2234

Answers (3)

Eelco van Vliet
Eelco van Vliet

Reputation: 1238

This works for me

import pandas as pd

filename = 'precincts-with-results.geojson.gz'
df = pd.read_json(filename)
features = df['features']
properties = [_['properties'] for _ in features.values]
collect_properties = {_: list() for _ in properties[0].keys()}
for record in properties:
    for col_name, value in record.items():
        collect_properties[col_name].append(value)

new_df = pd.DataFrame.from_dict(collect_properties).set_index('GEOID')
print(new_df)

Result looks like

                        votes_dem  votes_rep  ...  votes_per_sqkm  pct_dem_lead
GEOID                                          ...                              
05047-1-A (Oz Wd 1)           79.0      279.0  ...            54.2         -54.6
05149-11 - Dutch Creek         6.0       31.0  ...             0.3         -67.6
05081-Franklin Township       53.0      383.0  ...             3.8         -73.3
05027-McNeil City             64.0       68.0  ...            41.9          -2.9
05027-Taylor Township         67.0      342.0  ...             1.7         -65.0
                            ...        ...  ...             ...           ...
56007-01-01                  173.0      300.0  ...            21.8         -26.1
56045-05-01                   70.0      390.0  ...           259.5         -66.7
56045-05-02                   67.0      376.0  ...            66.5         -68.1
56045-05-03                   63.0      419.0  ...           141.4         -71.5
56041-130                    168.0      654.0  ...             1.7         -57.1
[146596 rows x 5 columns]

Upvotes: 1

Nk03
Nk03

Reputation: 14949

I've implemented a code to flatten JSON and extract selected fields -

How to extract fields from nested json and save in a data structure

Here's the code which will do the task (just update the field list and also replace 'test1.json' with your JSON file path)-

import json
from pprint import pprint
import pandas as pd

def flatten_json(nested_json, fields):
    out = []
    temp = {}

    def flatten(x, name=''):
        nonlocal temp
        if type(x) is dict:
            temp = {}
            for a in x:
                flatten(x[a], a)
        elif type(x) is list:
            for i, a in enumerate(x):
                flatten(a)
                i += 1
        elif name in fields:
            temp[name] = x
            out.append(temp)
    flatten(nested_json)
    return out

with open('test.json', 'r') as f:
    json1 = json.loads(f.read())

# mention the required fields here.
fields = ['GEOID','votes_dem','votes_rep','votes_total','votes_per_sqkm','pct_dem_lead']

result = (flatten_json(json1, fields))


df = pd.DataFrame(result)
df.drop_duplicates(inplace=True)

Upvotes: 0

cosmoem
cosmoem

Reputation: 36

Your json file doesn't have the correct format to be immediately transformed into a DataFrame by pandas. At first, you need to extract the data from the json file into a dictionary:

with open(os.path.join(sys.path[0], "precincts-with-results.geojson")) as json_file:
    data = json.load(json_file)

Then, you have to transform the data to fit your needs. Since the dictionary is nested, you need to normalize the data. Now I'm not entirely sure how exactly you want your DataFrame to look like in the end, so I'm not including code here, but I think you can use pd.json_normalize for this. It returns a DataFrame. The documentation on how this works can be found here: https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

Upvotes: 0

Related Questions