Reputation: 1694

Label dataframe column with a list

I have a dataframe column text

text
'a red apple'
'the apple is sweet'
'a yellow banana'
'a green melon'

I would like to create another column term by matching it with a list ['apple', 'banana, melon']

for term in the_list:
    df['term'] = bf['text'].apply(lambda x: term if term in x else 'None')

The result I get

text                 term  
'a red apple'        None
'the apple is sweet' None
'a yellow banana'    None
'a green melon'      melon

However, I expected it to be

text                 term  
'a red apple'        apple
'the apple is sweet' apple
'a yellow banana'    banana
'a green melon'      melon

I sense that it might be because I use a list but I don't know how to make a loop in lambda itself

Upvotes: 1

Answers (3)

Norton409

Reputation: 109

Using the split method will only work if the strings are the same all the time. you have to switch around the loop and lambda expression like so

df = pd.DataFrame(['a red apple',
'a banana yellow ',
'a green melon'], columns=['text'])

the_list = ['apple', 'banana',  'melon']

def fruit_finder(string):
    term_return = 'None'
    for term in the_list:
        if term in string:
            term_return = term
    return term_return

df['term'] = df['text'].apply(fruit_finder)

print(df)

will return the matching value from the list

and will result in a output of

               text    term
0       a red apple   apple
1  a banana yellow   banana
2     a green melon   melon

Edit: The reason you initial program doesn't work is that your loop and lambda are mixed up. You are looping through the terms and applying only that term to the dataframe (ie your last execution of the loop is only checking for the term melon so banana and apple come up as none)

Upvotes: 1

BENY

Reputation: 323226

Try with findall

df['new'] = df['text'].str.findall('|'.join(l)).str[0]
Out[66]: 
0     apple
1     apple
2    banana
3     melon
Name: text, dtype: object

Upvotes: 0

wasif

Reputation: 15480

Use .split

df['term'] = df['text'].apply(lambda x: x.split()[-1] if x.split()[-1] in myList else None)

Upvotes: 1

Label dataframe column with a list

Answers (3)

Related Questions