Reputation: 1662
I want to convert my nested json format into pandas dataframe i have tried but my data is something looking like this which is not correct
I have tried to fetch the json and save inside the innings dictionary and tried to convert into the pandas which is not working in the proper format
innings is the dictionary which i am trying to convert into pandas dataframe but it is not converting in the proper format
this is my json something like this
{
'1164223': [
{
'ball_limit': '300',
'balls': '300',
'batted': '1',
'batting_team_id': '2591',
'bowling_team_id': '1832',
'bpo': '6',
'byes': '1',
'event': '0',
'event_name': None,
'extras': '11',
'innings_number': '1',
'innings_numth': '1st',
'lead': '308',
'legbyes': '4',
'live_current': '0',
'live_current_name': None,
'minutes': None,
'noballs': '0',
'old_penalty_or_bonus': '0',
'over_limit': '50.0',
'over_limit_run_rate': '6.16',
'over_split_limit': '0.0',
'overs': '50.0',
'overs_docked': '0',
'penalties': '0',
'penalties_field_end': '0',
'penalties_field_start': '0',
'run_rate': '6.16',
'runs': '308',
'target': '0',
'wickets': '6',
'wides': '6'
},
{
'ball_limit': '300',
'balls': '294',
'batted': '1',
'batting_team_id': '1832',
'bowling_team_id': '2591',
'bpo': '6',
'byes': '0',
'event': '0',
'event_name': None,
'extras': '10',
'innings_number': '2',
'innings_numth': '1st',
'lead': '3',
'legbyes': '1',
'live_current': '1',
'live_current_name': 'current innings',
'minutes': None,
'noballs': '1',
'old_penalty_or_bonus': '0',
'over_limit': '50.0',
'over_limit_run_rate': '6.22',
'over_split_limit': '0.0',
'overs': '49.0',
'overs_docked': '0',
'penalties': '0',
'penalties_field_end': '0',
'penalties_field_start': '0',
'run_rate': '6.34',
'runs': '311',
'target': '309',
'wickets': '6',
'wides': '8'
}
],
'1165045': [
{
'ball_limit': '300',
'balls': '271',
'batted': '1',
'batting_team_id': '1003',
'bowling_team_id': '2989',
'bpo': '6',
'byes': '0',
'event': '1',
'event_name': 'all out',
'extras': '10',
'innings_number': '1',
'innings_numth': '1st',
'lead': '169',
'legbyes': '4',
'live_current': '0',
'live_current_name': None,
'minutes': None,
'noballs': '1',
'old_penalty_or_bonus': '0',
'over_limit': '50.0',
'over_limit_run_rate': '3.38',
'over_split_limit': '0.0',
'overs': '45.1',
'overs_docked': '0',
'penalties': '0',
'penalties_field_end': '0',
'penalties_field_start': '0',
'run_rate': '3.74',
'runs': '169',
'target': '0',
'wickets': '10',
'wides': '5'
},
{
'ball_limit': '300',
'balls': '239',
'batted': '1',
'batting_team_id': '2989',
'bowling_team_id': '1003',
'bpo': '6',
'byes': '0',
'event': '3',
'event_name': 'target reached',
'extras': '12',
'innings_number': '2',
'innings_numth': '1st',
'lead': '1',
'legbyes': '6',
'live_current': '1',
'live_current_name': 'current innings',
'minutes': None,
'noballs': '0',
'old_penalty_or_bonus': '0',
'over_limit': '50.0',
'over_limit_run_rate': '3.40',
'over_split_limit': '0.0',
'overs': '39.5',
'overs_docked': '0',
'penalties': '0',
'penalties_field_end': '0',
'penalties_field_start': '0',
'run_rate': '4.26',
'runs': '170',
'target': '170',
'wickets': '3',
'wides': '6'
}
]
}
Upvotes: 0
Views: 311
Reputation: 862511
Use concat
with dictionary comprehension
:
df = pd.concat({k: pd.DataFrame(v) for k, v in j.items()})
print (df)
ball_limit balls batted batting_team_id bowling_team_id bpo byes \
1164223 0 300 300 1 2591 1832 6 1
1 300 294 1 1832 2591 6 0
1165045 0 300 271 1 1003 2989 6 0
1 300 239 1 2989 1003 6 0
event event_name extras ... overs overs_docked penalties \
1164223 0 0 None 11 ... 50.0 0 0
1 0 None 10 ... 49.0 0 0
1165045 0 1 all out 10 ... 45.1 0 0
1 3 target reached 12 ... 39.5 0 0
penalties_field_end penalties_field_start run_rate runs target \
1164223 0 0 0 6.16 308 0
1 0 0 6.34 311 309
1165045 0 0 0 3.74 169 0
1 0 0 4.26 170 170
wickets wides
1164223 0 6 6
1 6 8
1165045 0 10 5
1 3 6
[4 rows x 32 columns]
Another solution is loop in dict comprehension
and add key of outer dicts for list of dicts, last pass to DataFrame
contructor:
df = pd.DataFrame([dict(x, **{'_id':k}) for k, v in j.items() for x in v])
print (df)
_id ball_limit balls batted batting_team_id bowling_team_id bpo byes \
0 1164223 300 300 1 2591 1832 6 1
1 1164223 300 294 1 1832 2591 6 0
2 1165045 300 271 1 1003 2989 6 0
3 1165045 300 239 1 2989 1003 6 0
event event_name ... overs overs_docked penalties \
0 0 None ... 50.0 0 0
1 0 None ... 49.0 0 0
2 1 all out ... 45.1 0 0
3 3 target reached ... 39.5 0 0
penalties_field_end penalties_field_start run_rate runs target wickets wides
0 0 0 6.16 308 0 6 6
1 0 0 6.34 311 309 6 8
2 0 0 3.74 169 0 10 5
3 0 0 4.26 170 170 3 6
[4 rows x 33 columns]
Upvotes: 1