Reputation: 2250
I have downloaded a .json file from this web page and converted into a dictionary with the following commands:
import urllib.request, json
with urllib.request.urlopen("https://www.bcusu.com/svc/voting/stats/election/paramstats/109?groupIds=1,12,7,3,6&sortBy=itemname&sortDirection=ascending") as url:
data = json.loads(url.read().decode())
#print(data)
My final goal is to convert my data
, which is a dictionary into a pandas data frame. The main thing is that the data
dictionary is nested and, to complicate things a bit further, there is a single column (Groups
) which is nested.
I have found this solution which does the job for a 'uniformly' nested dictionary which looks like the following:
user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
'Category 2': {'att_1': 23, 'att_2': 'another'}},
15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
'Category 2': {'att_1': 30, 'att_2': 'bar'}}}
By 'uniformly nested' I mean that the outer and inner keys in the dataframe above have all the same number of keys: 12
and 15
have both two keys Category 1
and Category 2
, which, finally, also have two keys att 1
and att 2
, which is not the case in my data
.
Upvotes: 3
Views: 547
Reputation: 6396
When I look into your data I can see that the complication came from groups, so I decided to isolate it and work separately on it :
I decided to create a single data frame for every single group:
here is the code:
data_df = {}
for category in data.get('Groups'):
#print(category)
data_df[category.get('Name')] = pd.DataFrame.from_records(category.get('Items'))
Here is the output for every group:
data_df['Faculty']
Eligible IsOtherItem Name NonVoters RelativeTurnout Turnout Voters
0 7249 False Faculty of Business, Law and Social Sciences 5880 4.779694 18.885363 1369
1 6226 False Faculty of Arts, Design and Media 5187 3.627540 16.688082 1039
2 6156 False Faculty of Computing, Engineering and the Buil... 5482 2.353188 10.948668 674
3 8943 False Faculty of Health, Education and Life Sciences 7958 3.439006 11.014201 985
4 71 True Other 56 0.052371 21.126761 15
And Age-Range :
Eligible IsOtherItem Name NonVoters RelativeTurnout Turnout Voters
0 13246 False 18 - 21 10657 9.039173 19.545523 2589
1 6785 False 22 - 25 5939 2.953704 12.468681 846
2 3133 False 26 - 30 2862 0.946163 8.649856 271
3 5392 False Over 30 5024 1.284826 6.824926 368
and others groups.
The remaining part is just information dictionary:
del data['Groups']
You can create a serie from them or another dataframe.
If you know how the data was generated you can do futher analysis and build your data.frame
Upvotes: 2