yeqiuuu
yeqiuuu

Reputation: 147

How to concatenate key values of JSON object stored in pandas dataframe cell into a string per row?

My question is:

how to concatenate key values of JSON object stored in pandas dataframe cell into a string per row? Sorry, I feel my problem is pretty straight-forward but I cannot find a good way to phrase it.

My context is:

Let's say I have a pandas dataframe, df, that contains a column named "participants". The cell values are JSON objects, like this for instance:

df['participants'][0] == df.participants[0] ==

[{'participantId': 1,
  'championId': 7 },
 {'participantId': 2,
  'championId': 350 },
 {'participantId': 3,
  'championId': 266 },
 {'participantId': 4,
  'championId': 517 },
 {'participantId': 5,
  'championId': 110, },
...
...
{'participantId': 10,
'championId': 10 }]

df.participants[1] would include totally different information, with the same structure. If anybody's interested, this is part of what the League of Legends RiotWatcher python API spits out for per-game data.

My goal is to, for each participantId, concatenate that into a single string per row in our df, such that we have a new column 'x' that contains a string '7, 350, 266, 517, 110' for each row depending on whatever is in the participants column.

My working solutions are:

for i in range(0, 20): #range of however many rows we have in dataframe, assume 20
    y = ''
    for j in range(0, 10): #there are always ten participants
        this_champion_id = str(df_d1['participants'][i][j].get('championId'))
        y += ' '+this_champion_id
    df_d1['x'] = y

(Sidenote: I am avoiding using lists, because I've read lists are not vectorized in pandas, which means they are slower. That's why I am using a string here.)

However, as my data is about 100k rows long, this feels like it's not the fastest solution, especially since I think nested for loops are slower right?

Would it be possible to do something like

df['x'] = [str(df_d1['participants'][key][value].get('championId') for key, value in df['participants']] ?

I am thinking a way of using a single for loop would be by leveraging the json library, like:

for i in range(0, 20):
    x = str(pd.json_normalize(df_d1.participants[i])['championId'].values)
    df['x'] = x

Has anybody ran into something similar? Did you find a painless solution to this problem? My solutions are taking some time to run.

Thank you!

Upvotes: 0

Views: 793

Answers (1)

madbird
madbird

Reputation: 1379

In [16]: df['x'] = df['participants'].map(lambda x: ', '.join(str(i['participantId']) for i in x))
    ...: print(df['participants'][0])
    ...: print(df['x'][0])
    ...:
[{'participantId': 1, 'championId': 7}, {'participantId': 2, 'championId': 350}, {'participantId': 3, 'championId': 266}]
1, 2, 3

Upvotes: 1

Related Questions