Matthias Gallagher
Matthias Gallagher

Reputation: 471

If column contains substring from list, create new column with removed substring from list

I'm trying to create a simplified name column. I have a brand name column and a list of strings as shown below. If the brand name column contains any string from list, then create a simplified brand name column with the string matched removed. The other brand name column elements that do not contain any strings from list will be carried over to the simplified column

l = ['co', 'ltd', 'company']

df:

Brand
Nike
Adidas co
Apple company
Intel
Google ltd
Walmart co
Burger King

Desired df:

Brand                Simplified
Nike                   Nike
Adidas co             Adidas
Apple company          Apple
Intel                  Intel
Google Ltd             Google
Walmart co            Walmart
Burger King          Burger King

Thanks in advance! Any help is appreciated!!

Upvotes: 2

Views: 610

Answers (4)

Sam S.
Sam S.

Reputation: 803

df = {"Brand":["Nike","Adidas co","Apple company","Google ltd","Berger King"]}
df = pd.DataFrame(df)

list_items = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = [' '.join(w) for w in df['Brand'].str.split().apply(lambda x: [i for i in x if i not in list_items])]

enter image description here

Upvotes: 0

Rakesh
Rakesh

Reputation: 82765

Using str.replace

Ex:

l = ['co', 'ltd', 'company']
df = pd.DataFrame({'Brand': ['Nike', 'Adidas co', 'Apple company', 'Intel', 'Google ltd', 'Walmart co', 'Burger King']})
df['Simplified'] = df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b", "").str.strip()
#or df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b$", "").str.strip()  #TO remove only in END of string
print(df)

Output:

           Brand   Simplified
0           Nike         Nike
1      Adidas co       Adidas
2  Apple company        Apple
3          Intel        Intel
4     Google ltd       Google
5     Walmart co      Walmart
6    Burger King  Burger King

Upvotes: 0

AdibP
AdibP

Reputation: 2939

how about use this to remove substrings and trailing whitespaces

list_substring = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = df['Brand'].str.replace('|'.join(list_substring), '').str.lstrip()

Upvotes: 1

bigbounty
bigbounty

Reputation: 17368

In [28]: df
Out[28]:
           Brand
0           Nike
1      Adidas co
2  Apple company
3          Intel
4     Google ltd
5     Walmart co
6    Burger King

In [30]: df["Simplified"] = df.Brand.apply(lambda x: x.split()[0] if x.split()[-1] in l else x)

In [31]: df
Out[31]:
           Brand   Simplified
0           Nike         Nike
1      Adidas co       Adidas
2  Apple company        Apple
3          Intel        Intel
4     Google ltd       Google
5     Walmart co      Walmart
6    Burger King  Burger King

Upvotes: 0

Related Questions