mightypile
mightypile

Reputation: 8022

How can I extract the indices from a pandas.DataFrame where values intersect another dataframe?

I have two pandas dataframes:

import pandas as pd

friends = pd.dataframe({
    'name' : ['Alice', 'Jim', 'Edward'],
})

everyone = pd.dataframe({
    'name' : ['Edward', 'Conrad', 'Lucy', 'Jim', 'Frank', 'Alice', 'Sam']
})

I can get a list of my friends, in the 'everyone' order, with indices.

everyone.loc[everyone['name'].isin(friends['name'])]

I can get a boolean mask of where my friends are in 'everyone'.

everyone['name'].isin(friends['name'])

I even thought I was onto a clunky solution with the following, but it also re-ordered things.

everyone.reset_index().merge(friends, how='right', on='name').set_index('index')

But I can't figure out how to get their ordinal place in the 'everyone' dataframe. Ideally, the solution would add a lookup column to the friends dataframe that looks like below. Alice is the 5th entry in everyone; Jim is the 3rd; Edward the 0th. The order (matching my original friends' order) is obviously critical.

  name   everyone_id
0 Alice   5
1 Jim     3
2 Edward  0

I could probably write a slow lookup function and friends.apply() it, but assume pandas has a simpler function or argument I just can't find.

Upvotes: 1

Views: 50

Answers (1)

jezrael
jezrael

Reputation: 863421

You can use mapwithswapped indices with values:

d = everyone['name'].to_dict()
d = {v:k for k, v in d.items()}

friends['everyone_id'] = friends['name'].map(d)
print (friends)
     name  everyone_id
0   Alice            5
1     Jim            3
2  Edward            0

Similar solution is map by Series:

s = pd.Series(everyone['name'].index, index=everyone['name'].values)
friends['everyone_id'] = friends['name'].map(s)
print (friends)

     name  everyone_id
0   Alice            5
1     Jim            3
2  Edward            0

Upvotes: 1

Related Questions