Reputation: 155
Consider this df 'A':
name index pet
0 Alice 2 dog
1 Bob 5 cat
2 Chuck 12 cat
3 Daren 4 bird
4 Emily 9 bird
And then this df 'B':
pet
0 dog
1 cat
2 dog
3 bird
4 cat
5 cat
6 bird
7 cat
8 bird
9 bird
...
If the value in column 'index' from 'A' and the value from column 'pet' match the actual index of dataset 'B' together with the value in column 'pet' from dataset B, then keep those values, and filter out all the rest.
The resulting dataframe should look like this:
pet
2 dog
5 cat
9 bird
...
What is the most efficient way to do this? Any help is appreciated.
Data:
dfA
:
{'name': ['Alice', 'Bob', 'Chuck', 'Daren', 'Emily'],
'index': [2, 5, 12, 4, 9],
'pet': ['dog', 'cat', 'cat', 'bird', 'bird']}
dfB
:
{'pet': ['dog', 'cat', 'dog', 'bird', 'cat', 'cat', 'bird', 'cat', 'bird', 'bird']}
Upvotes: 1
Views: 189
Reputation: 8778
Here is a way using to_records()
with isin()
(df2.loc[pd.Series(df2.reset_index()
.to_records(index=False)
.tolist())
.isin(df1[['index','pet']]
.to_records(index=False)
.tolist())])
Output:
pet
2 dog
5 cat
9 bird
Upvotes: 0
Reputation: 3096
You could do a merge.
import pandas as pd
dfa = pd.DataFrame({'name': {0: 'Alice', 1: 'Bob', 2: 'Chuck', 3: 'Daren', 4: 'Emily'},
'index': {0: 2, 1: 5, 2: 12, 3: 4, 4: 9},
'pet': {0: 'dog', 1: 'cat', 2: 'cat', 3: 'bird', 4: 'bird'}})
dfb = pd.DataFrame({'pet': {0: 'dog',
1: 'cat',
2: 'dog',
3: 'bird',
4: 'cat',
5: 'cat',
6: 'bird',
7: 'cat',
8: 'bird',
9: 'bird'}})
dfm = pd.merge(dfa, dfb, left_on=['index', 'pet'], right_on=[dfb.index, 'pet'])
dfm = dfm[['index', 'pet']].set_index('index', drop=True)
Output:
pet
index
2 dog
5 cat
9 bird
Upvotes: 2
Reputation:
One option is to reindex
dfB
with dfA['index']
and evaluate where the "pet" values match:
tmp = dfB.reindex(dfA['index'])
out = tmp[tmp['pet'].eq(dfA.set_index('index')['pet'])].rename_axis([None])
Another option is map
dfB.index to "pet" column in dfA
and create a boolean mask that shows where the "pet" columns match; then filter dfB
:
out = dfB[dfB.index.map(dfA.set_index('index')['pet']) == dfB['pet']]
Output:
pet
2 dog
5 cat
9 bird
Upvotes: 2