Sam L
Sam L

Reputation: 171

Count occurrences of item in one dataframe in another

I am currently running into a problem and hoping that someone could assist. Currently have 2 dataframes of items which are hundreds of thousands of lines long. (one has over 200k and one over 180k). the larger of the 2 dataframes is going to contain unique values of users, while the smaller one does not so for example:

df1:
user1
user2
user3
user4
user5

df2:
user1
user1
user5
user4
user5
user5

What i need to do is take each user from df1 and efficiently see if it is in df2 and how many times it occurs.

Thanks!

Upvotes: 7

Views: 4675

Answers (3)

piRSquared
piRSquared

Reputation: 294218

Use pd.factorize and np.bincount to efficiently produce the value counts

f, u = pd.factorize(df2.user)
d = dict(zip(u, np.bincount(f)))

df1['Counts'] = df1.user.map(d)
df1

    user  Counts
0  user1     2.0
1  user2     NaN
2  user3     NaN
3  user4     1.0
4  user5     3.0

Setup

df1 = pd.Series(list('12345')).radd('user').to_frame('user')
df2 = pd.Series(list('115455')).radd('user').to_frame('user')

Upvotes: 3

BENY
BENY

Reputation: 323226

Using value_counts

df1['Newcount']=df1['df1:'].map(df2['df2:'].value_counts())
df1
Out[117]: 
    df1:  Newcount
0  user1       2.0
1  user2       NaN
2  user3       NaN
3  user4       1.0
4  user5       3.0

Upvotes: 13

Ami Tavory
Ami Tavory

Reputation: 76297

Assuming the relevant column in each DataFrame is called 'user', you can use

pd.merge(
    df1,
    df2.user.groupby(df2.user).count(),
    left_on='user',
    right_index=True,
    how='left')

Explanation:

  • The groupby + count will find the number of occurrences of each user. It will create a DataFrame whose index is the user, and the value is the count.

  • The merge left-merges the resulting DataFrame onto df1.

Upvotes: 2

Related Questions