uncrazimatic
uncrazimatic

Reputation: 27

Regex expression to replace a string containing a substring with the substring only in a DataFrame

I am trying to replace all strings within a Python dataframe column that contain a certain substring, with only the substring itself. Preferably it would be an 'inplace=True' sort of result.

I've tried various regex expressions, unfortunately as I'm new to this, everything I have tried has not yielded the desired result. I am on Python 3.7.3.

I think the code I need to conduct the replacement within the dataframe is

df.replace(to_replace = regex expression that identifies substring in string containing the substring , value = 'substring', regex = True). So below is an example of what I'm trying to do

#original dataframe
import pandas as pd

df = pd.DataFrame({'brand':['brand1 & brand2','brand1/brand3','brand4 brand3','brand1 and brand 6']})
df

    brand
0   brand1 & brand2
1   brand6
2   brand1/brand3
3   brand9
4   brand4 brand3
5   brand8
6   brand1 and brand6

#desired result

df

    brand
0   brand1
1   brand6
2   brand1
3   brand9
4   brand4 brand3
5   brand8
6   brand1

So far, my regex expressions have effected no change. Just as a note, the brand names don't actually include 1-9, to avoid any possible confusion. The actual df I'm manipulating has a little over 10k rows, but within the column 'brands' strings that contain brand1 comprise about 2k of the 10k, and I need to replace all of the strings containing brand1 with just 'brand1' alone.

Upvotes: 0

Views: 264

Answers (1)

moys
moys

Reputation: 8033

Use:

df['brand'] = np.where(df['brand'].str.contains('brand1'), 'brand1',df['brand'])

Input

    brand
0   brand1 & brand2
1   brand6
2   brand1/brand3
3   brand9
4   brand4 brand3
5   brand1 and brand 6

Output

    brand
0   brand1
1   brand6
2   brand1
3   brand9
4   brand4 brand3
5   brand1

Upvotes: 1

Related Questions