DrakeMurdoch
DrakeMurdoch

Reputation: 859

How to check if an element of one list is another when they are in pandas columns

Given a dataframe

d = {'col1': [['how', 'are', 'you'], ['im', 'fine', 'thanks'], ['you', 'know'], [np.nan]],
     'col2': [['tell', 'how', 'me', 'you'], ['who', 'cares'], ['know', 'this', 'padewan'], ['who', 'are', 'you']]

df = pd.DataFrame(data=d)

I want to make a third column col3 which is any element in the list in col2 that is contained in the list in the corresponding row in the list in col1, otherwise np.nan.

It would have to take any elements that match.

In this case, then, col3 would be:

           col1                      col2                           col3
0   ['how', 'are', 'you']      ['tell', 'how, 'me', 'you']        ['how', 'you']
1   ['im', 'fine', 'thanks']   ['who', 'cares']                   [np.nan] 
2   ['you', 'know']            ['know', 'this', 'padewan']        ['know']
3   [np.nan]                   ['who', 'are', 'you']              [np.nan]

I tried

df['col3'] = [c in l for c, l in zip(df['col1'], df['col2'])]

which doesn't work at all, so any ideas would be super helpful.

Upvotes: 1

Views: 922

Answers (4)

Mayank Porwal
Mayank Porwal

Reputation: 34086

Something like this:

df['col3'] = [list(set(a).intersection(b)) for a, b in zip(df.col1, df.col2)]

Output:

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [you, how]
1  [im, fine, thanks]           [who, cares]          []
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]          []

Upvotes: 5

Yashica Arora
Yashica Arora

Reputation: 43

Something like this:

 d =  {'col1': [['how', 'are', 'you'], ['im', 'fine', 'thanks'], ['you', 'know'], [numpy.nan]],
                'col2': [['tell', 'how', 'me', 'you'], ['who', 'cares'], ['know', 'this', 'padewan'],
                      ['who', 'are', 'you']]}
        df = pandas.DataFrame(d)
        list_col3 = []
        for index, row in df.iterrows():
            a_set= set(row['col1'])
            b_set= set(row['col2'])
            if len(a_set.intersection(b_set)) > 0:
                list_col3.append(list(a_set.intersection(b_set)))
            else:
                list_col3.append([numpy.nan])
        df['col3'] = list_col3
        print(df)

Output :

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [how, you]
1  [im, fine, thanks]           [who, cares]       [nan]
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]       [nan]

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195468

Another version:

df['col3'] = df.apply(lambda x: [*set(x['col1']).intersection(x['col2'])] or [np.nan], axis=1 )

print(df)

Prints:

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [how, you]
1  [im, fine, thanks]           [who, cares]       [nan]
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]       [nan]

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150765

I'd write a separate function with help of np.intersect1d and apply:

def intersect_nan(a,b):
    ret = np.intersect1d(a,b) 
    return list(ret) if len(ret)>0 else [np.nan]

df['col3'] = [intersect_nan(a,b) for a,b in zip(df['col1'], df['col2'])]

Output:

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [how, you]
1  [im, fine, thanks]           [who, cares]       [nan]
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]       [nan]

Upvotes: 2

Related Questions