sectechguy
sectechguy

Reputation: 2117

Pandas find distinct destination where ALL users have visited

I have a dataframe with 2 columns. One for users and the other for destinations. I am looking to find the destination where ALL users visited. This is the dataframe

df1
 user    destination
0  1       dest1
1  2       dest3
2  3       dest4
3  2       dest1
4  2       dest4
5  3       dest1
6  4       dest2
7  4       dest1
8  1       dest4
9  4       dest4

Desired Output:

dest1
dest4

I am working with a dataset with thousands of users and destinations so this must scale out.

Upvotes: 1

Views: 161

Answers (3)

Vaishali
Vaishali

Reputation: 38415

Check for equality between the sets of users grouped by id.

users = df.groupby('destination').user.apply(lambda x: np.equal(set(df.user.unique()),set(x)))
users[users]

You get

destination
dest1    True
dest4    True

If you need the destination as array

users[users].index

Index(['dest1', 'dest4'], dtype='object', name='destination')

Upvotes: 0

vercelli
vercelli

Reputation: 4757

You could try to unstack() the dataframe converting the users into columns. Then drop any row with an empty column (unvisited).

df.groupby(['destination', 'user']).size().unstack().dropna().reset_index().destination

Upvotes: 2

Dani Mesejo
Dani Mesejo

Reputation: 61910

You could try:

# find number of unique users
nunique = df.user.nunique()

# find number of unique users by destination
destinations = df.groupby('destination').user.nunique().to_frame().reset_index()

# find the destinations that match
result = destinations[destinations.user == nunique].drop('user', axis=1)
print(result)

Output

  destination
0       dest1
3       dest4

Upvotes: 2

Related Questions