Reputation: 1352
I have a json file that contains the record of goals scored at the minutes of the game. I tried to flatten it using the following code:
data_Loc ='Season Fixtures.json'
with open(data_Loc) as data_file:
d= json.load(data_file)
df_Fixtures = pd.io.json.json_normalize(d,'matches')
The output is as follow:
Then I convert goals to series using:
df_goal = df_Fixtures.goals.apply(pd.Series)
and the output is as follow:
It includes another dictionary in the columns.
How can I convert goals column directly to periods?
The input data file can be downloaded from here
Can anyone advise me how to flatten to the last part of goal column? That means goals columns will be converted in to multiple columns such as Period, minutes, playerId,TeamId, Type.
To include matchId, I create a new data frame as follow and combine with previous data frame advised by Jez as follow:
df_MatchID = pd.io.json.json_normalize(d,'matches')
df_MatchID = df_MatchID[['matchId']]
df_Fixtures_details = pd.concat([df_MatchID,df_Fixtures],axis =1)
The output is as follow ( Other columns shows NaN:)
Thanks Zep
Upvotes: 1
Views: 56
Reputation: 862581
I believe you need:
df_Fixtures = pd.io.json.json_normalize(d, ['matches','goals'])
print (df_Fixtures.head())
minute period playerId teamId type
0 14 FirstHalf 206314 3161 goal
1 72 SecondHalf 20661 3204 goal
2 78 SecondHalf 206314 3161 goal
3 3 FirstHalf 300830 3187 goal
4 72 SecondHalf 21385 3187 goal
EDIT:
data_Loc ='Season Fixtures.json'
with open(data_Loc) as data_file:
d= json.load(data_file)['matches']
df = pd.io.json.json_normalize(d, ['goals'],'matchId')
print (df.head())
minute period playerId teamId type matchId
0 14 FirstHalf 206314 3161 goal 2759508
1 72 SecondHalf 20661 3204 goal 2759508
2 78 SecondHalf 206314 3161 goal 2759508
3 3 FirstHalf 300830 3187 goal 2759507
4 72 SecondHalf 21385 3187 goal 2759507
Upvotes: 2