Reputation: 105
I have the following dataframe:
ID Company Name State
200 Apple CA
300 Microsoft CA
300 Grant Thornton IL
200 Amazon WA
200 Apple CA
250 Dollar Tree VA
250 Dollar Tree VA
250 Dollar Tree VA
400 Target MN
400 Target MN
Running the following code I get the result:
df.groupby('ID')['Company Name'].value_counts()
ID Company Name Count
200 Apple 2
Amazon 1
250 Dollar Tree 3
300 Microsoft 1
Grant Thornton 1
400 Target 2
I am looking for a way to return only the results where there are multiple rows associated with each ID. That is, the desired output is:
ID Company Name Count
200 Apple 2
Amazon 1
300 Microsoft 1
Grant Thornton 1
It doesn't much matter how I get to the solution, it could either be filtering during the original groupby, or saving the original groupby as a series and then running another expression on it. Any and all help would be much appreciated, thanks so much!!
Upvotes: 1
Views: 957
Reputation: 402844
You'll want to pre-filter df
based on the uniqueness of "Company Name" per "ID":
df_uniq = df[df.groupby('ID')['Company Name'].transform('nunique') > 1]
df_uniq.groupby('ID')['Company Name'].value_counts()
ID Company Name
200 Apple 2
Amazon 1
300 Grant Thornton 1
Microsoft 1
Name: Company Name, dtype: int64
You may also filter after the fact:
res = df.groupby('ID')['Company Name'].value_counts()
res[res.groupby(level=0).transform('count') > 1]
ID Company Name
200 Apple 2
Amazon 1
300 Grant Thornton 1
Microsoft 1
Name: Company Name, dtype: int64
Upvotes: 1