Reputation: 895
I have the below DataFrame containing lists of fruits:
import pandas as pd
df = pd.DataFrame(([['apple','pear'],['orange','grapes','apple']],
[['pear', 'fig','raspberry'],['pineaple', 'raspberry']],
[['mango'],['melon']]), columns = ['A','B'])
I am trying to find a way so that I can create a new column ('C'), the contents of which is the set difference within each row. More specifically, I need only the fruits left in column A once column B has been subtracted.
A B
apple, pear orange, grapes, apple
pear, fig, raspberry pineapple raspberry
mango melon
I have read a few similar questions without much luck. So far I have tried the below, which I know not to work, but which hopefully explains what i am trying to do.
df['C'] = [[list(set(row)) in df['A'] - list(set(row)) in df['B']] for row in df]
The intended output would be as follows:
C
pear
pear, fig
mango
Upvotes: 2
Views: 154
Reputation: 294358
map
df.assign(C=[*map(lambda a, b: {*a} - {*b}, df.A, df.B)])
A B C
0 [apple, pear] [orange, grapes, apple] {pear}
1 [pear, fig, raspberry] [pineaple, raspberry] {pear, fig}
2 [mango] [melon] {mango}
And without the lambda
def f(a, b): return {*a} - {*b}
df.assign(C=[*map(f, df.A, df.B)])
Upvotes: 2
Reputation: 150765
Quick solution (in terms of code, not run time)
df['A'].apply(set) - df['B'].apply(set)
Output:
0 {pear}
1 {fig, pear}
2 {mango}
dtype: object
Upvotes: 2
Reputation: 323306
We can do
df.A.map(set)-df.B.map(set)
Out[343]:
0 {pear}
1 {fig, pear}
2 {mango}
dtype: object
Upvotes: 6
Reputation: 2434
This will do the trick
df['C'] = df.apply(lambda x: set(x['A']).difference(x['B']), axis=1)
Upvotes: 4