Reputation: 91
I'm new to Python and APIs in general, so this is probably a basic question with an easy answer. I'm trying to get data on congressional representatives from Propublica's API using Python. I can get the REST API to run, but I'm having problems correctly structuring the resulting json data as a dataframe. I think it's because there are multiple nested levels in the data. I tried normalizing the data, but I can only get it to work for the first nested level.
This is the code I have. Please note I've removed my API key, but you can get one quickly and easily here.
# Import programs
import pandas as pd
from pandas.io.json import json_normalize
import requests
import json
import time
import csv
### Index 0
# Requesting data trhough API
payload = {'X-API-Key': 'a876543211234'}
terms = '"trade war"AND"China"'
index = str(0) # 440 is last offset for this call
response = requests.get('https://api.propublica.org/congress/v1/116/house/members.json', headers=payload)
print(response.status_code)
#Formating json files better
json_data = json.loads(response.content.decode("utf-8"))
# Writing Data as String
json_string = json.dumps(json_data)
# Creating Stage 1 dataframe
jdata = json.loads(json_string)
df = pd.DataFrame(jdata)
df2 = pd.DataFrame(df.results)
# Normalizing Data - converts nested data into a regular looking dataframe
normal_data_0 = json_normalize(data=df['results'])
This is what the JSON data look like. Notice that all of the representatives' data are nested under 'results' and the 'members':
{'status': 'OK',
'copyright': ' Copyright (c) 2021 Pro Publica Inc. All Rights Reserved.',
'results': [{'congress': '116',
'chamber': 'House',
'num_results': 451,
'offset': 0,
'members': [{'id': 'A000374',
'title': 'Representative',
'short_title': 'Rep.',
'api_uri': 'https://api.propublica.org/congress/v1/members/A000374.json',
'first_name': 'Ralph',
'middle_name': None,
'last_name': 'Abraham',
'suffix': None,
'date_of_birth': '1954-09-16',
'gender': 'M',
'party': 'R',
'leadership_role': '',
'twitter_account': 'RepAbraham',
'facebook_account': 'CongressmanRalphAbraham',
'youtube_account': None,
'govtrack_id': '412630',
'cspan_id': '76236',
'votesmart_id': '155414',
'icpsr_id': '21522',
'crp_id': 'N00036633',
'google_entity_id': '/m/012dwd7_',
'fec_candidate_id': 'H4LA05221',
'url': 'https://abraham.house.gov',
'rss_url': 'https://abraham.house.gov/rss.xml',
'contact_form': None,
'in_office': False,
'cook_pvi': 'R+15',
'dw_nominate': 0.541,
'ideal_point': None,
'seniority': '6',
'next_election': '2020',
'total_votes': 954,
'missed_votes': 377,
'total_present': 0,
'last_updated': '2020-12-31 18:30:50 -0500',
'ocd_id': 'ocd-division/country:us/state:la/cd:5',
'office': '417 Cannon House Office Building',
'phone': '202-225-8490',
'fax': None,
'state': 'LA',
'district': '5',
'at_large': False,
'geoid': '2205',
'missed_votes_pct': 39.52,
'votes_with_party_pct': 94.93,
'votes_against_party_pct': 4.9},
{'id': 'A000370',
'title': 'Representative',
...
And this is what my 'dataset' looks like. All of the JSON data is stored as a string in the 'members' column of the only row:
normal_data_0
congress chamber num_results offset members
0 116 House 451 0 [{'id': 'A000374', 'title': 'Representative', ...
I've tried running the data through json_normalize
twice, and by adding two variables [results,members]
as well. Nothing I have tried has worked.
Any suggestions?
Upvotes: 2
Views: 3723
Reputation: 62383
'results'
key
is a 1 element list
, so 'members'
can be normalized by selecting the 'members'
key from the dict
at index 0.import pandas as pd
import requests
# Requesting data trhough API
payload = {'X-API-Key': '...'}
terms = '"trade war"AND"China"'
index = str(0) # 440 is last offset for this call
response = requests.get('https://api.propublica.org/congress/v1/116/house/members.json', headers=payload)
# extract the json data from the response
json_data = response.json()
# normalize only members
members = pd.json_normalize(data=json_data['results'][0]['members'])
# alternatively: normalize members and the preceding keys
members = pd.json_normalize(data=json_data['results'][0], record_path=['members'], meta=['congress', 'chamber', 'num_results', 'offset'])
display(members)
id title short_title api_uri first_name middle_name last_name suffix date_of_birth gender party leadership_role twitter_account facebook_account youtube_account govtrack_id cspan_id votesmart_id icpsr_id crp_id google_entity_id fec_candidate_id url rss_url contact_form in_office cook_pvi dw_nominate ideal_point seniority next_election total_votes missed_votes total_present last_updated ocd_id office phone fax state district at_large geoid missed_votes_pct votes_with_party_pct votes_against_party_pct
0 A000374 Representative Rep. https://api.propublica.org/congress/v1/members/A000374.json Ralph None Abraham None 1954-09-16 M R RepAbraham CongressmanRalphAbraham None 412630 76236 155414 21522 N00036633 /m/012dwd7_ H4LA05221 https://abraham.house.gov https://abraham.house.gov/rss.xml None False R+15 0.541 None 6 2020 954.0 377.0 0.0 2020-12-31 18:30:50 -0500 ocd-division/country:us/state:la/cd:5 417 Cannon House Office Building 202-225-8490 None LA 5 False 2205 39.52 94.93 4.90
1 A000370 Representative Rep. https://api.propublica.org/congress/v1/members/A000370.json Alma None Adams None 1946-05-27 F D None RepAdams CongresswomanAdams None 412607 76386 5935 21545 N00035451 /m/02b45d H4NC12100 https://adams.house.gov https://adams.house.gov/rss.xml None False D+18 -0.465 None 8 2020 954.0 26.0 0.0 2020-12-31 18:30:55 -0500 ocd-division/country:us/state:nc/cd:12 2436 Rayburn House Office Building 202-225-1510 None NC 12 False 3712 2.73 99.24 0.65
2 A000055 Representative Rep. https://api.propublica.org/congress/v1/members/A000055.json Robert B. Aderholt None 1965-07-22 M R None Robert_Aderholt RobertAderholt RobertAderholt 400004 45516 441 29701 N00003028 /m/024p03 H6AL04098 https://aderholt.house.gov https://aderholt.house.gov/rss.xml None False R+30 0.369 None 24 2020 954.0 71.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:al/cd:4 1203 Longworth House Office Building 202-225-4876 None AL 4 False 0104 7.44 93.60 6.29
3 A000371 Representative Rep. https://api.propublica.org/congress/v1/members/A000371.json Pete None Aguilar None 1979-06-19 M D None reppeteaguilar reppeteaguilar None 412615 79994 70114 21506 N00033997 /m/0jwv0xf H2CA31125 https://aguilar.house.gov https://aguilar.house.gov/rss.xml None False D+8 -0.291 None 6 2020 954.0 9.0 0.0 2020-12-31 18:30:52 -0500 ocd-division/country:us/state:ca/cd:31 109 Cannon House Office Building 202-225-3201 None CA 31 False 0631 0.94 97.45 2.44
4 A000372 Representative Rep. https://api.propublica.org/congress/v1/members/A000372.json Rick None Allen None 1951-11-07 M R None reprickallen CongressmanRickAllen None 412625 62545 136062 21516 N00033720 /m/0127y9dk H2GA12121 https://allen.house.gov None None False R+9 0.679 None 6 2020 954.0 15.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:ga/cd:12 2400 Rayburn House Office Building 202-225-2823 None GA 12 False 1312 1.57 92.26 7.63
5 A000376 Representative Rep. https://api.propublica.org/congress/v1/members/A000376.json Colin None Allred None 1983-04-15 M D None RepColinAllred None None 412828 None 177357 None N00040989 /m/03d066b H8TX32098 https://allred.house.gov None None False R+5 NaN None 2 2020 954.0 29.0 0.0 2020-12-31 18:30:52 -0500 ocd-division/country:us/state:tx/cd:32 328 Cannon House Office Building 202-225-2231 None TX 32 False 4832 3.04 97.72 2.17
6 A000367 Representative Rep. https://api.propublica.org/congress/v1/members/A000367.json Justin None Amash None 1980-04-18 M I justinamash repjustinamash repjustinamash 412438 1033767 105566 21143 N00031938 /m/0c00p_n https://amash.house.gov https://amash.house.gov/rss.xml None False R+6 NaN None 10 2020 524.0 0.0 10.0 2020-12-31 18:30:47 -0500 ocd-division/country:us/state:mi/cd:3 None None None MI 3 False 2603 0.00 58.49 41.51
7 A000367 Representative Rep. https://api.propublica.org/congress/v1/members/A000367.json Justin None Amash None 1980-04-18 M R justinamash repjustinamash repjustinamash 412438 1033767 105566 21143 N00031938 /m/0c00p_n H0MI03126 https://amash.house.gov https://amash.house.gov/rss.xml None False None 0.654 None 10 2020 430.0 0.0 5.0 2020-12-28 21:04:36 -0500 ocd-division/country:us/state:mi/cd:3 106 Cannon House Office Building 202-225-3831 None MI 3 False 2603 0.00 61.97 37.79
8 A000369 Representative Rep. https://api.propublica.org/congress/v1/members/A000369.json Mark None Amodei None 1958-06-12 M R None MarkAmodeiNV2 MarkAmodeiNV2 markamodeinv2 412500 62817 12537 21196 N00031177 /m/03bzdkn H2NV02395 https://amodei.house.gov https://amodei.house.gov/rss/news-releases.xml None False R+7 0.384 None 10 2020 954.0 36.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:nv/cd:2 104 Cannon House Office Building 202-225-6155 None NV 2 False 3202 3.77 92.63 7.26
9 A000377 Representative Rep. https://api.propublica.org/congress/v1/members/A000377.json Kelly None Armstrong None 1976-10-08 M R None RepArmstrongND None None 412794 None 139338 None N00042868 /g/11hcszksh3 H8ND00096 https://armstrong.house.gov None None False R+16 NaN None 2 2020 954.0 33.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:nd/cd:1 1004 Longworth House Office Building 202-225-2611 None ND At-Large True 3800 3.46 93.31 6.58
Upvotes: 2