You_Donut
You_Donut

Reputation: 155

Filter dataset from multiple columns of another dataset

Consider this df 'A':

     name  index     pet
0   Alice      2     dog
1     Bob      5     cat
2   Chuck     12     cat
3   Daren      4    bird
4   Emily      9    bird

And then this df 'B':

    pet
0   dog
1   cat
2   dog
3  bird
4   cat
5   cat
6  bird
7   cat
8  bird
9  bird
...

If the value in column 'index' from 'A' and the value from column 'pet' match the actual index of dataset 'B' together with the value in column 'pet' from dataset B, then keep those values, and filter out all the rest.

The resulting dataframe should look like this:

    pet
2   dog
5   cat
9  bird
...

What is the most efficient way to do this? Any help is appreciated.

Data:

dfA:

{'name': ['Alice', 'Bob', 'Chuck', 'Daren', 'Emily'],
 'index': [2, 5, 12, 4, 9],
 'pet': ['dog', 'cat', 'cat', 'bird', 'bird']}

dfB:

{'pet': ['dog', 'cat', 'dog', 'bird', 'cat', 'cat', 'bird', 'cat', 'bird', 'bird']}

Upvotes: 1

Views: 189

Answers (3)

rhug123
rhug123

Reputation: 8778

Here is a way using to_records() with isin()

(df2.loc[pd.Series(df2.reset_index()
                   .to_records(index=False)
                   .tolist())
         .isin(df1[['index','pet']]
               .to_records(index=False)
               .tolist())])

Output:

    pet
2   dog
5   cat
9  bird

Upvotes: 0

Matthew Borish
Matthew Borish

Reputation: 3096

You could do a merge.

import pandas as pd

dfa = pd.DataFrame({'name': {0: 'Alice', 1: 'Bob', 2: 'Chuck', 3: 'Daren', 4: 'Emily'},
 'index': {0: 2, 1: 5, 2: 12, 3: 4, 4: 9},
 'pet': {0: 'dog', 1: 'cat', 2: 'cat', 3: 'bird', 4: 'bird'}})

dfb = pd.DataFrame({'pet': {0: 'dog',
  1: 'cat',
  2: 'dog',
  3: 'bird',
  4: 'cat',
  5: 'cat',
  6: 'bird',
  7: 'cat',
  8: 'bird',
  9: 'bird'}})

dfm = pd.merge(dfa, dfb, left_on=['index', 'pet'], right_on=[dfb.index, 'pet'])
dfm = dfm[['index', 'pet']].set_index('index', drop=True)

Output:

    pet
index   
2   dog
5   cat
9   bird

Upvotes: 2

user7864386
user7864386

Reputation:

One option is to reindex dfB with dfA['index'] and evaluate where the "pet" values match:

tmp = dfB.reindex(dfA['index'])
out = tmp[tmp['pet'].eq(dfA.set_index('index')['pet'])].rename_axis([None])

Another option is map dfB.index to "pet" column in dfA and create a boolean mask that shows where the "pet" columns match; then filter dfB:

out = dfB[dfB.index.map(dfA.set_index('index')['pet']) == dfB['pet']]

Output:

    pet
2   dog
5   cat
9  bird

Upvotes: 2

Related Questions