Reputation: 1112
I am trying to move to a more pythonic way of writing my code e.g. list comprehension. Here, I am trying to create a new column 'Tag' that returns an element of a list if that element is contained in the Pandas column as per the dataframe news_df_output.
news = {'Text':['Nike invests in shoes', 'Adidas invests in t-shirts', 'dog drank water'], 'Source':['NYT', 'WP', 'Guardian']}
news_df = pd.DataFrame(news)
buyer = ['Amazon', "Adidas", 'Walmart', 'Children Place', 'Levi', 'VF']
# news_df['Tag'] = [x for x in buyer if news_df['Text'].str.contains(x) else 'n/a']
output_news = {'Text':['Nike invests in shoes', 'Adidas invests in t-shirts', 'dog drank water'], 'Source':['NYT', 'WP', 'Guardian'], 'Tag':['n/a', 'Adidas', 'n/a']}
news_df_output = pd.DataFrame(output_news)
news_df_output
However, my code returns an invalid syntax issue.
What is the problem here?
Upvotes: 0
Views: 1017
Reputation: 863216
You can join of values of list by |
for regex or
and use Series.str.extract
:
news_df['Tag'] = news_df['Text'].str.extract('(' + '|'.join(buyer) + ')')
print (news_df)
Text Source Tag
0 Nike invests in shoes NYT NaN
1 Adidas invests in t-shirts WP Adidas
2 dog drank water Guardian NaN
Your solution for all matches is possible change with another nested list comprehension:
news_df['Tag'] = [[y for y in buyer if y in x] for x in news_df['Text']]
print (news_df)
Text Source Tag
0 Nike invests in shoes NYT []
1 Adidas invests in t-shirts WP [Adidas]
2 dog drank water Guardian []
Or for first match use next
with iter
for possible set NaN
if no match:
news_df['Tag'] = [next(iter([y for y in buyer if y in x]), np.nan) for x in news_df['Text']]
print (news_df)
Text Source Tag
0 Nike invests in shoes NYT NaN
1 Adidas invests in t-shirts WP Adidas
2 dog drank water Guardian NaN
Upvotes: 1