Kallol
Kallol

Reputation: 2189

Replace pandas Dataframe column values with the list if it matches a word

I have a list of colors like this:

color = ['green', 'blue', 'red']

I have a Dataframe like this:

df:
col1        col2
 A        dark green
 B        sea blue
 C          blue
 D       exclusive red
 E          green
 F       pale red

I want to match col2 with the color list. If any word of col2 matches the element of the color list, replace it with the lists value.

The result data frame will be

 col1          col2
  A            green
  B            blue
  C            blue
  D            red
  E            green
  F            red

What is the most efficient way to do it using pandas?

Upvotes: 3

Views: 2751

Answers (2)

cs95
cs95

Reputation: 402483

Use str.extract:

df['col2'] = df.col2.str.extract(f"({'|'.join(color)})", expand=False)
df

  col1   col2
0    A  green
1    B   blue
2    C   blue
3    D    red
4    E  green
5    F    red

For better performance, you can use a list comprehension that uses a precompiled regex pattern to perform re.search:

import re

p = re.compile(rf"({'|'.join(color)})")
def try_extract(s):
    try:
        return p.search(s).group(1)
    except (TypeError, AttributeError):
        return s

df['col2'] = [try_extract(s) for s in df['col2']
df

  col1   col2
0    A  green
1    B   blue
2    C   blue
3    D    red
4    E  green
5    F    red

If the color doesn't match how to keep keep the original color? I don't want nan values there.

This is automatically handled by try_except:

df2 = df.append(pd.Series(
    {'col1': 'G', 'col2': 'something else'}), ignore_index=True)
df2['col2'] = [try_extract(s) for s in df2['col2']]
df2

  col1            col2
0    A           green
1    B            blue
2    C            blue
3    D             red
4    E           green
5    F             red
6    G  something else   # other values are preserved.

For more information on why list comprehensions should be considered a competitive alternative, you can check For loops with pandas - When should I care?.

Upvotes: 1

jezrael
jezrael

Reputation: 862641

Use Series.str.extract with joined values by | for regex OR, last add fillna for replace non matched values (NaNs) by original column:

print (df)
  col1           col2
0    A     dark green
1    B       sea blue
2    C           blue
3    D  exclusive red
4    E          green
5    F           pale <- not matched value

color=['green','blue','red']

pat = r'({})'.format('|'.join(color))
df['col2'] = df['col2'].str.extract(pat, expand=False).fillna(df['col2'])

print (df)
  col1   col2
0    A  green
1    B   blue
2    C   blue
3    D    red
4    E  green
5    F   pale

Upvotes: 1

Related Questions