Reputation: 5095
How can I filter df
rows where name
contains Al
, and replace large
with L
in sport
?
Reproducible example:
df = pd.DataFrame({'name': ['Bob', 'Jane', 'Alice'],
'sport': ['tennis small', 'football medium', 'basketball large']})
Expected output:
name sport
0 Bob tennis small
1 Jane football medium
2 Alice basketball L
Upvotes: 2
Views: 878
Reputation: 862611
Filter only necessary rows and for them use Series.str.replace
, it is better for performance like replace all column without filtering:
m = df['name'].str.contains('Al', na=False)
df.loc[m, 'sport'] = df.loc[m, 'sport'].str.replace('large', 'L', regex=True)
print (df)
name sport
0 Bob tennis small
1 Jane football medium
2 Alice basketball L
#100 matched values from 30k
df = pd.DataFrame({'name': ['Bob','Jane','alice'] * 9900 + ['Bob', 'Jane', 'Alice'] * 100,
'sport': ['tennis small','football medium', 'basketball large'] * 10000})
print (df)
name sport
0 Bob tennis small
1 Jane football medium
2 alice basketball large
3 Bob tennis small
4 Jane football medium
... ...
29995 Jane football medium
29996 Alice basketball large
29997 Bob tennis small
29998 Jane football medium
29999 Alice basketball large
[30000 rows x 2 columns]
In [76]: %%timeit
...: m = df['name'].str.contains('Al')
...: df.loc[m, 'sport'] = df.loc[m, 'sport'].str.replace('large', 'L', regex=True)
...:
...:
14.6 ms ± 193 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [77]: %%timeit
...: df.loc[df.name.str.contains('Al'), 'sport'] = df.sport.str.replace('large', 'L')
...:
...:
34.8 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [78]: %%timeit
...: df['sport'] = np.where(df['name'].str.contains('Al'), df['sport'].str.replace('large', 'L', regex=True), df['sport'])
...:
...:
35 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#10k matched values from 30k
df = pd.DataFrame({'name': ['Bob', 'Jane','Alice'] * 10000,
'sport': ['tennis small', 'football medium', 'basketball large'] * 10000})
print (df)
In [80]: %%timeit
...: m = df['name'].str.contains('Al')
...: df.loc[m, 'sport'] = df.loc[m, 'sport'].str.replace('large', 'L', regex=True)
...:
...:
22.2 ms ± 148 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [81]: %%timeit
...: df.loc[df.name.str.contains('Al'), 'sport'] = df.sport.str.replace('large', 'L')
...:
...:
34 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [82]: %%timeit
...: df['sport'] = np.where(df['name'].str.contains('Al'), df['sport'].str.replace('large', 'L', regex=True), df['sport'])
...:
...:
34.9 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 1
Reputation: 13821
Similarly you can achieve that with numpy.where
in conjunction with str.contains
:
import numpy as np
df['sport'] = np.where(df['name'].str.contains('Al'), # Condition
df['sport'].str.replace('large', 'L', regex=True), # return if True
df['sport']) # return if False
>>> df
name sport
0 Bob tennis small
1 Jane football medium
2 Alice basketball L
Upvotes: 1
Reputation: 71570
Try with loc
and str.contains
with str.replace
:
df.loc[df.name.str.contains('Al'), 'sport'] = df.sport.str.replace('large', 'L')
Example:
>>> df.loc[df.name.str.contains('Al'), 'sport'] = df.sport.str.replace('large', 'L')
>>> df
name sport
0 Bob tennis small
1 Jane football medium
2 Alice basketball L
>>>
Filter the rows where the name
column contains Al
and then replace large
with L
in the sports
column.
Upvotes: 1