cookie1986
cookie1986

Reputation: 895

How to find set differences per row in a pandas DataFrame?

I have the below DataFrame containing lists of fruits:

import pandas as pd
df = pd.DataFrame(([['apple','pear'],['orange','grapes','apple']],
               [['pear', 'fig','raspberry'],['pineaple', 'raspberry']],
               [['mango'],['melon']]), columns = ['A','B'])

I am trying to find a way so that I can create a new column ('C'), the contents of which is the set difference within each row. More specifically, I need only the fruits left in column A once column B has been subtracted.

         A                      B
apple, pear              orange, grapes, apple
pear, fig, raspberry     pineapple raspberry
mango                    melon

I have read a few similar questions without much luck. So far I have tried the below, which I know not to work, but which hopefully explains what i am trying to do.

df['C'] = [[list(set(row)) in df['A'] - list(set(row)) in df['B']] for row in df]

The intended output would be as follows:

C
pear
pear, fig
mango

Upvotes: 2

Views: 154

Answers (4)

piRSquared
piRSquared

Reputation: 294358

map

df.assign(C=[*map(lambda a, b: {*a} - {*b}, df.A, df.B)])

                        A                        B            C
0           [apple, pear]  [orange, grapes, apple]       {pear}
1  [pear, fig, raspberry]    [pineaple, raspberry]  {pear, fig}
2                 [mango]                  [melon]      {mango}

And without the lambda

def f(a, b): return {*a} - {*b}
df.assign(C=[*map(f, df.A, df.B)])

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150765

Quick solution (in terms of code, not run time)

df['A'].apply(set) - df['B'].apply(set)

Output:

0         {pear}
1    {fig, pear}
2        {mango}
dtype: object

Upvotes: 2

BENY
BENY

Reputation: 323306

We can do

df.A.map(set)-df.B.map(set)
Out[343]: 
0         {pear}
1    {fig, pear}
2        {mango}
dtype: object

Upvotes: 6

kosnik
kosnik

Reputation: 2434

This will do the trick

df['C'] = df.apply(lambda x: set(x['A']).difference(x['B']), axis=1)

Upvotes: 4

Related Questions