Reputation: 27
I am trying to replace all strings within a Python dataframe column that contain a certain substring, with only the substring itself. Preferably it would be an 'inplace=True' sort of result.
I've tried various regex expressions, unfortunately as I'm new to this, everything I have tried has not yielded the desired result. I am on Python 3.7.3.
I think the code I need to conduct the replacement within the dataframe is
df.replace(to_replace = regex expression that identifies substring in string containing the substring , value = 'substring', regex = True). So below is an example of what I'm trying to do
#original dataframe
import pandas as pd
df = pd.DataFrame({'brand':['brand1 & brand2','brand1/brand3','brand4 brand3','brand1 and brand 6']})
df
brand
0 brand1 & brand2
1 brand6
2 brand1/brand3
3 brand9
4 brand4 brand3
5 brand8
6 brand1 and brand6
#desired result
df
brand
0 brand1
1 brand6
2 brand1
3 brand9
4 brand4 brand3
5 brand8
6 brand1
So far, my regex expressions have effected no change. Just as a note, the brand names don't actually include 1-9, to avoid any possible confusion. The actual df I'm manipulating has a little over 10k rows, but within the column 'brands' strings that contain brand1 comprise about 2k of the 10k, and I need to replace all of the strings containing brand1 with just 'brand1' alone.
Upvotes: 0
Views: 264
Reputation: 8033
Use:
df['brand'] = np.where(df['brand'].str.contains('brand1'), 'brand1',df['brand'])
Input
brand
0 brand1 & brand2
1 brand6
2 brand1/brand3
3 brand9
4 brand4 brand3
5 brand1 and brand 6
Output
brand
0 brand1
1 brand6
2 brand1
3 brand9
4 brand4 brand3
5 brand1
Upvotes: 1