Reputation: 922
I have never worked with json
files before. Currently this is what I am doing:
df = pd.read_json("precincts-with-results.geojson.gz")
df['features']
This is the result from above:
{'type': 'Feature',
'properties': {'GEOID': '05047-1-A (Oz Wd 1)',
'votes_dem': 79,
'votes_rep': 279,
'votes_total': 366,
'votes_per_sqkm': 54.2,
'pct_dem_lead': -54.6},
'geometry': {'type': 'MultiPolygon',
'coordinates': [[[[-93.88536364311612, 35.483758439321655],
[-93.8840470388143, 35.483727092097084],
[-93.88403177163875, 35.483726728784056],
[-93.88403177478405, 35.48372661335151],
[-93.87956152062023, 35.483586322546344],
[-93.87520339804045, 35.48339873745174],
[-93.87534656033012, 35.480428139370346],
[-93.87604589142236, 35.48045051399295], ...
I want a data frame that would look like this:
GEOID votes_dem votes_rep votes_total votes_per_sqkm pct_dem_lead
05047-1-A (Oz Wd 1) 79 279 366 54.2 -54.6
You can download the dataset here (size 264 mb): https://int.nyt.com/newsgraphics/elections/map-data/2020/national/precincts-with-results.geojson.gz
Help and code appreciated !!
Upvotes: 3
Views: 2234
Reputation: 1238
This works for me
import pandas as pd
filename = 'precincts-with-results.geojson.gz'
df = pd.read_json(filename)
features = df['features']
properties = [_['properties'] for _ in features.values]
collect_properties = {_: list() for _ in properties[0].keys()}
for record in properties:
for col_name, value in record.items():
collect_properties[col_name].append(value)
new_df = pd.DataFrame.from_dict(collect_properties).set_index('GEOID')
print(new_df)
Result looks like
votes_dem votes_rep ... votes_per_sqkm pct_dem_lead
GEOID ...
05047-1-A (Oz Wd 1) 79.0 279.0 ... 54.2 -54.6
05149-11 - Dutch Creek 6.0 31.0 ... 0.3 -67.6
05081-Franklin Township 53.0 383.0 ... 3.8 -73.3
05027-McNeil City 64.0 68.0 ... 41.9 -2.9
05027-Taylor Township 67.0 342.0 ... 1.7 -65.0
... ... ... ... ...
56007-01-01 173.0 300.0 ... 21.8 -26.1
56045-05-01 70.0 390.0 ... 259.5 -66.7
56045-05-02 67.0 376.0 ... 66.5 -68.1
56045-05-03 63.0 419.0 ... 141.4 -71.5
56041-130 168.0 654.0 ... 1.7 -57.1
[146596 rows x 5 columns]
Upvotes: 1
Reputation: 14949
I've implemented a code to flatten JSON and extract selected fields -
How to extract fields from nested json and save in a data structure
Here's the code which will do the task (just update the field list and also replace 'test1.json' with your JSON file path)-
import json
from pprint import pprint
import pandas as pd
def flatten_json(nested_json, fields):
out = []
temp = {}
def flatten(x, name=''):
nonlocal temp
if type(x) is dict:
temp = {}
for a in x:
flatten(x[a], a)
elif type(x) is list:
for i, a in enumerate(x):
flatten(a)
i += 1
elif name in fields:
temp[name] = x
out.append(temp)
flatten(nested_json)
return out
with open('test.json', 'r') as f:
json1 = json.loads(f.read())
# mention the required fields here.
fields = ['GEOID','votes_dem','votes_rep','votes_total','votes_per_sqkm','pct_dem_lead']
result = (flatten_json(json1, fields))
df = pd.DataFrame(result)
df.drop_duplicates(inplace=True)
Upvotes: 0
Reputation: 36
Your json file doesn't have the correct format to be immediately transformed into a DataFrame by pandas. At first, you need to extract the data from the json file into a dictionary:
with open(os.path.join(sys.path[0], "precincts-with-results.geojson")) as json_file:
data = json.load(json_file)
Then, you have to transform the data to fit your needs. Since the dictionary is nested, you need to normalize the data. Now I'm not entirely sure how exactly you want your DataFrame to look like in the end, so I'm not including code here, but I think you can use pd.json_normalize
for this. It returns a DataFrame. The documentation on how this works can be found here: https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
Upvotes: 0