Reputation: 5532
Say I have a list BRANDS that contains brand names:
BRANDS = ['Samsung', 'Apple', 'Nike', .....]
Dataframe A has following structure
row item_title brand_name
1 | Apple 6S | Apple
2 | Nike BB Shoes | na <-- need to fill with Nike
3 | Samsung TV | na <--need fill with Samsung
4 | Used bike | na <--No need to do anything because there is no brand_name in the title
....
I want to fill the column brand_name of Row 2 with Nike, Row 3 with Samsung, because they null and the item_title contains keywords that can be found in list BRANDS. How can I do it?
Upvotes: 3
Views: 554
Reputation: 1096
You can achieve the result you are after by writing a simple function. You can then use .apply()
with a lambda function
to generate your desired column.
def contains_any(s, arr):
for item in arr:
if item in s: return item
return np.nan
df['brand_name'] = df['product'].apply(lambda x: match_substring(x, product_map))
Upvotes: 0
Reputation: 210982
Vectorized solution:
In [168]: x = df.item_title.str.split(expand=True)
In [169]: df['brand_name'] = \
df['brand_name'].fillna(x[x.isin(BRANDS)]
.ffill(axis=1)
.bfill(axis=1)
.iloc[:, 0])
In [170]: df
Out[170]:
row item_title brand_name
0 1 Apple 6S Apple
1 2 Nike BB Shoes Nike
2 3 Samsung TV Samsung
3 4 Used bike NaN
Upvotes: 3
Reputation: 43544
One approach is to use apply()
:
import pandas as pd
BRANDS = ['Samsung', 'Apple', 'Nike']
def get_brand_name(row):
if ~pd.isnull(row['brand_name']):
# don't do anything if brand_name is not null
return row['brand_name']
item_title = row['item_title']
title_words = map(str.title, item_title.split())
for tw in title_words:
if tw in BRANDS:
# return first 'match'
return tw
# default return None
return None
df['brand_name'] = df.apply(lambda x: get_brand_name(x), axis=1)
print(df)
# row item_title brand_name
#0 1 Apple 6S Apple
#1 2 Nike BB Shoes Nike
#2 3 Samsung TV Samsung
#3 4 Used bike None
Notes
set
instead of a list
because lookups will be faster. However, this won't work if you care about order.Upvotes: 1