Reputation: 1694
I have a dataframe column text
text
'a red apple'
'the apple is sweet'
'a yellow banana'
'a green melon'
I would like to create another column term
by matching it with a list ['apple', 'banana, melon']
for term in the_list:
df['term'] = bf['text'].apply(lambda x: term if term in x else 'None')
The result I get
text term
'a red apple' None
'the apple is sweet' None
'a yellow banana' None
'a green melon' melon
However, I expected it to be
text term
'a red apple' apple
'the apple is sweet' apple
'a yellow banana' banana
'a green melon' melon
I sense that it might be because I use a list but I don't know how to make a loop in lambda itself
Upvotes: 1
Views: 374
Reputation: 109
Using the split method will only work if the strings are the same all the time. you have to switch around the loop and lambda expression like so
df = pd.DataFrame(['a red apple',
'a banana yellow ',
'a green melon'], columns=['text'])
the_list = ['apple', 'banana', 'melon']
def fruit_finder(string):
term_return = 'None'
for term in the_list:
if term in string:
term_return = term
return term_return
df['term'] = df['text'].apply(fruit_finder)
print(df)
will return the matching value from the list
and will result in a output of
text term
0 a red apple apple
1 a banana yellow banana
2 a green melon melon
Edit: The reason you initial program doesn't work is that your loop and lambda are mixed up. You are looping through the terms and applying only that term to the dataframe (ie your last execution of the loop is only checking for the term melon so banana and apple come up as none)
Upvotes: 1
Reputation: 323226
Try with findall
df['new'] = df['text'].str.findall('|'.join(l)).str[0]
Out[66]:
0 apple
1 apple
2 banana
3 melon
Name: text, dtype: object
Upvotes: 0
Reputation: 15480
Use .split
df['term'] = df['text'].apply(lambda x: x.split()[-1] if x.split()[-1] in myList else None)
Upvotes: 1