Reputation: 147
how to concatenate key values of JSON object stored in pandas dataframe cell into a string per row? Sorry, I feel my problem is pretty straight-forward but I cannot find a good way to phrase it.
Let's say I have a pandas dataframe, df, that contains a column named "participants". The cell values are JSON objects, like this for instance:
df['participants'][0] == df.participants[0] ==
[{'participantId': 1,
'championId': 7 },
{'participantId': 2,
'championId': 350 },
{'participantId': 3,
'championId': 266 },
{'participantId': 4,
'championId': 517 },
{'participantId': 5,
'championId': 110, },
...
...
{'participantId': 10,
'championId': 10 }]
df.participants[1]
would include totally different information, with the same structure. If anybody's interested, this is part of what the League of Legends RiotWatcher python API spits out for per-game data.
My goal is to, for each participantId
, concatenate that into a single string per row in our df, such that we have a new column 'x' that contains a string '7, 350, 266, 517, 110'
for each row depending on whatever is in the participants column.
for i in range(0, 20): #range of however many rows we have in dataframe, assume 20
y = ''
for j in range(0, 10): #there are always ten participants
this_champion_id = str(df_d1['participants'][i][j].get('championId'))
y += ' '+this_champion_id
df_d1['x'] = y
(Sidenote: I am avoiding using lists, because I've read lists are not vectorized in pandas, which means they are slower. That's why I am using a string here.)
However, as my data is about 100k rows long, this feels like it's not the fastest solution, especially since I think nested for loops are slower right?
Would it be possible to do something like
df['x'] = [str(df_d1['participants'][key][value].get('championId') for key, value in df['participants']]
?
I am thinking a way of using a single for loop would be by leveraging the json library, like:
for i in range(0, 20):
x = str(pd.json_normalize(df_d1.participants[i])['championId'].values)
df['x'] = x
Has anybody ran into something similar? Did you find a painless solution to this problem? My solutions are taking some time to run.
Thank you!
Upvotes: 0
Views: 793
Reputation: 1379
In [16]: df['x'] = df['participants'].map(lambda x: ', '.join(str(i['participantId']) for i in x))
...: print(df['participants'][0])
...: print(df['x'][0])
...:
[{'participantId': 1, 'championId': 7}, {'participantId': 2, 'championId': 350}, {'participantId': 3, 'championId': 266}]
1, 2, 3
Upvotes: 1