Reputation: 471
I'm trying to create a simplified name column. I have a brand name column and a list of strings as shown below. If the brand name column contains any string from list, then create a simplified brand name column with the string matched removed. The other brand name column elements that do not contain any strings from list will be carried over to the simplified column
l = ['co', 'ltd', 'company']
df:
Brand
Nike
Adidas co
Apple company
Intel
Google ltd
Walmart co
Burger King
Desired df:
Brand Simplified
Nike Nike
Adidas co Adidas
Apple company Apple
Intel Intel
Google Ltd Google
Walmart co Walmart
Burger King Burger King
Thanks in advance! Any help is appreciated!!
Upvotes: 2
Views: 610
Reputation: 803
df = {"Brand":["Nike","Adidas co","Apple company","Google ltd","Berger King"]}
df = pd.DataFrame(df)
list_items = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = [' '.join(w) for w in df['Brand'].str.split().apply(lambda x: [i for i in x if i not in list_items])]
Upvotes: 0
Reputation: 82765
Using str.replace
Ex:
l = ['co', 'ltd', 'company']
df = pd.DataFrame({'Brand': ['Nike', 'Adidas co', 'Apple company', 'Intel', 'Google ltd', 'Walmart co', 'Burger King']})
df['Simplified'] = df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b", "").str.strip()
#or df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b$", "").str.strip() #TO remove only in END of string
print(df)
Output:
Brand Simplified
0 Nike Nike
1 Adidas co Adidas
2 Apple company Apple
3 Intel Intel
4 Google ltd Google
5 Walmart co Walmart
6 Burger King Burger King
Upvotes: 0
Reputation: 2939
how about use this to remove substrings and trailing whitespaces
list_substring = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = df['Brand'].str.replace('|'.join(list_substring), '').str.lstrip()
Upvotes: 1
Reputation: 17368
In [28]: df
Out[28]:
Brand
0 Nike
1 Adidas co
2 Apple company
3 Intel
4 Google ltd
5 Walmart co
6 Burger King
In [30]: df["Simplified"] = df.Brand.apply(lambda x: x.split()[0] if x.split()[-1] in l else x)
In [31]: df
Out[31]:
Brand Simplified
0 Nike Nike
1 Adidas co Adidas
2 Apple company Apple
3 Intel Intel
4 Google ltd Google
5 Walmart co Walmart
6 Burger King Burger King
Upvotes: 0