Sinchetru
Sinchetru

Reputation: 571

String replacement with pandas

I have a pandas column with some strings values like:

White bear
Brown Bear
Brown Bear 100 Kg
White bear 200 cm             

How to check all the strings if they contain the sequence 'White bear' and replace the entire value (not only the sequence) with a string like 'White_bear'?

df['Species'] = df['Species'].str.replace('White bear', 'White_bear')   

did not work right for me because it replaces only the sequence.

Upvotes: 2

Views: 168

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

you can use boolean indexing:

In [173]: df.loc[df.Species.str.contains(r'\bWhite\s+bear\b'), 'Species'] = 'White_bear'

In [174]: df
Out[174]:
             Species
0         White_bear
1         Brown Bear
2  Brown Bear 100 Kg
3         White_bear

or bit more general solution:

In [204]: df
Out[204]:
             Species
0         White bear
1         Brown Bear
2  Brown Bear 100 Kg
3  White bear 200 cm

In [205]: from_re = [r'.*?\bwhite\b\s+\bbear\b.*',r'.*?\bbrown\b\s+\bbear\b.*']

In [206]: to_re = ['White_bear','Brown_bear']

In [207]: df.Species = df.Species.str.lower().replace(from_re, to_re, regex=True)

In [208]: df
Out[208]:
      Species
0  White_bear
1  Brown_bear
2  Brown_bear
3  White_bear

RegEx explanation

Upvotes: 2

Related Questions