Reputation: 495
I am doing some EDA on the PUBG data from the Kaggle competition. I would like to convert the common game modes into the standard form Solo, Duo, Squad, Flare and Crash
Here is a list of unique values:
{'flaretpp', 'crashtpp', 'squad-fpp', 'duo-fpp', 'crashfpp', 'normal-squad',
'normal-squad-fpp', 'normal-duo-fpp', 'normal-duo', 'normal-solo', 'squad',
'duo', 'solo-fpp', 'solo', 'normal-solo-fpp', 'flarefpp'}
I basically want to remove the "normal-", "-fpp", "fpp", and "tpp" substring from the values.
I have some code that works, but is very slow (There is approx 4.5M rows). I'm wondering if there is a faster/better way to do this?
for i in range(len(data['matchType'])):
data['matchType'][i] = data['matchType'][i].replace('normal-','')
data['matchType'][i] = data['matchType'][i].replace('-fpp','')
data['matchType'][i] = data['matchType'][i].replace('tpp','')
data['matchType'][i] = data['matchType'][i].replace('fpp','')
Upvotes: 0
Views: 33
Reputation: 249434
Load your data into a Pandas Series and do it with a single command:
mymode.str.replace(r'normal-|-fpp|fpp|tpp', '')
Using your example data, that gives you:
0 flare
1 crash
2 squad
3 duo
4 crash
5 squad
6 squad
7 duo
8 duo
9 solo
10 squad
11 duo
12 solo
13 solo
14 solo
15 flare
Upvotes: 3