ssast
ssast

Reputation: 789

Python pandas - select by row

I am trying to select rows in a pandas data frame based on it's values matching those of another data frame. Crucially, I only want to match values in rows, not throughout the whole series. For example:

df1 = pd.DataFrame({'a':[1, 2, 3], 'b':[4, 5, 6]})

df2 = pd.DataFrame({'a':[3, 2, 1], 'b':[4, 5, 6]})

I want to select rows where both 'a' and 'b' values from df1 match any row in df2. I have tried:

df1[(df1['a'].isin(df2['a'])) & (df1['b'].isin(df2['b']))]

This of course returns all rows, as the all values are present in df2 at some point, but not necessarily the same row. How can I limit this so the values tested for 'b' are only those rows where the value 'a' was found? So with the example above, I am expecting only row index 1 ([2, 5]) to be returned.

Note that data frames may be of different shapes, and contain multiple matching rows.

Upvotes: 2

Views: 213

Answers (2)

Divakar
Divakar

Reputation: 221524

Similar to this post, here's one using broadcasting -

df1[(df1.values == df2.values[:,None]).all(-1).any(0)]

The idea is :

1) Use np.all for the both part in ""both 'a' and 'b' values"".

2) Use np.any for the any part in "from df1 match any row in df2".

3) Use broadcasting for doing all these in a vectorized fashion by extending dimensions with None/np.newaxis.

Sample run -

In [41]: df1
Out[41]: 
   a  b
0  1  4
1  2  5
2  3  6

In [42]: df2  # Modified to add another row : [1,4] for variety
Out[42]: 
   a  b
0  3  4
1  2  5
2  1  6
3  1  4

In [43]: df1[(df1.values == df2.values[:,None]).all(-1).any(0)]
Out[43]: 
   a  b
0  1  4
1  2  5

Upvotes: 4

piRSquared
piRSquared

Reputation: 294228

use numpy broadcasting

pd.DataFrame((df1.values[:, None] == df2.values).all(2),
             pd.Index(df1.index, name='df1'),
             pd.Index(df2.index, name='df2'))

enter image description here

Upvotes: 0

Related Questions