Reputation: 571
I have a pandas column with some strings values like:
White bear
Brown Bear
Brown Bear 100 Kg
White bear 200 cm
How to check all the strings if they contain the sequence 'White bear' and replace the entire value (not only the sequence) with a string like 'White_bear'?
df['Species'] = df['Species'].str.replace('White bear', 'White_bear')
did not work right for me because it replaces only the sequence.
Upvotes: 2
Views: 168
Reputation: 210822
you can use boolean indexing:
In [173]: df.loc[df.Species.str.contains(r'\bWhite\s+bear\b'), 'Species'] = 'White_bear'
In [174]: df
Out[174]:
Species
0 White_bear
1 Brown Bear
2 Brown Bear 100 Kg
3 White_bear
or bit more general solution:
In [204]: df
Out[204]:
Species
0 White bear
1 Brown Bear
2 Brown Bear 100 Kg
3 White bear 200 cm
In [205]: from_re = [r'.*?\bwhite\b\s+\bbear\b.*',r'.*?\bbrown\b\s+\bbear\b.*']
In [206]: to_re = ['White_bear','Brown_bear']
In [207]: df.Species = df.Species.str.lower().replace(from_re, to_re, regex=True)
In [208]: df
Out[208]:
Species
0 White_bear
1 Brown_bear
2 Brown_bear
3 White_bear
Upvotes: 2