Pandas: Check Column Membership in Other Column (Same Row)

Question

I have a Pandas DataFrame like this:

       A        B
0   [C, D, E]   C
1   [X, Y, Z]   G

created from:

example = pd.DataFrame({"A":[["C", "D", "E"], ["X", "Y", "Z"]], "B":["C", "G"]})

I want to count how often a value occurs both in the list in column A and under column B.

So the correct output for value C would be 1 and for value Z would be 0. Any suggestions without resorting to going row-by-row (and losing out on vectorization)?

Thanks!

user3483203 · Accepted Answer

Not necessarily a vectorized approach, but using apply:

df.apply(lambda x: x['B'] in x['A'], axis=1).astype(int)

0    1
1    0
dtype: int32

Edit: Not even including np.in1d anymore because of how badly it scaled

Surprisingly, I got a huge performance boost using a basic list comprehension over apply:

pd.Series([b in a for a, b in zip(df.A, df.B)]).astype(int)

Some timings:

df = pd.concat([df]*5000)

In [158]: %timeit pd.Series([b in a for a, b in zip(df.A, df.B)]).astype(int)
1.55 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [159]: %timeit df.apply(lambda x: x['B'] in x['A'], axis=1).astype(int)
344 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Pandas: Check Column Membership in Other Column (Same Row)

Answers (2)

Related Questions