Reputation: 2189
I have a list of colors like this:
color = ['green', 'blue', 'red']
I have a Dataframe like this:
df:
col1 col2
A dark green
B sea blue
C blue
D exclusive red
E green
F pale red
I want to match col2
with the color
list. If any word of col2
matches the element of the color
list, replace it with the lists value.
The result data frame will be
col1 col2
A green
B blue
C blue
D red
E green
F red
What is the most efficient way to do it using pandas?
Upvotes: 3
Views: 2751
Reputation: 402483
Use str.extract
:
df['col2'] = df.col2.str.extract(f"({'|'.join(color)})", expand=False)
df
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F red
For better performance, you can use a list comprehension that uses a precompiled regex pattern to perform re.search
:
import re
p = re.compile(rf"({'|'.join(color)})")
def try_extract(s):
try:
return p.search(s).group(1)
except (TypeError, AttributeError):
return s
df['col2'] = [try_extract(s) for s in df['col2']
df
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F red
If the color doesn't match how to keep keep the original color? I don't want nan values there.
This is automatically handled by try_except
:
df2 = df.append(pd.Series(
{'col1': 'G', 'col2': 'something else'}), ignore_index=True)
df2['col2'] = [try_extract(s) for s in df2['col2']]
df2
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F red
6 G something else # other values are preserved.
For more information on why list comprehensions should be considered a competitive alternative, you can check For loops with pandas - When should I care?.
Upvotes: 1
Reputation: 862641
Use Series.str.extract
with joined values by |
for regex OR
, last add fillna
for replace non matched values (NaN
s) by original column:
print (df)
col1 col2
0 A dark green
1 B sea blue
2 C blue
3 D exclusive red
4 E green
5 F pale <- not matched value
color=['green','blue','red']
pat = r'({})'.format('|'.join(color))
df['col2'] = df['col2'].str.extract(pat, expand=False).fillna(df['col2'])
print (df)
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F pale
Upvotes: 1