Joel
Joel

Reputation: 105

Return groupby results where the index has more than one row in pandas

I have the following dataframe:

ID      Company Name       State
200     Apple              CA
300     Microsoft          CA
300     Grant Thornton     IL
200     Amazon             WA
200     Apple              CA
250     Dollar Tree        VA
250     Dollar Tree        VA
250     Dollar Tree        VA
400     Target             MN
400     Target             MN

Running the following code I get the result:

df.groupby('ID')['Company Name'].value_counts()

ID     Company Name    Count
200    Apple           2
       Amazon          1
250    Dollar Tree     3
300    Microsoft       1
       Grant Thornton  1
400    Target          2

I am looking for a way to return only the results where there are multiple rows associated with each ID. That is, the desired output is:

ID    Company Name    Count
200   Apple           2
      Amazon          1
300   Microsoft       1
      Grant Thornton  1

It doesn't much matter how I get to the solution, it could either be filtering during the original groupby, or saving the original groupby as a series and then running another expression on it. Any and all help would be much appreciated, thanks so much!!

Upvotes: 1

Views: 957

Answers (1)

cs95
cs95

Reputation: 402844

You'll want to pre-filter df based on the uniqueness of "Company Name" per "ID":

df_uniq = df[df.groupby('ID')['Company Name'].transform('nunique') > 1]
df_uniq.groupby('ID')['Company Name'].value_counts()

ID   Company Name  
200  Apple             2
     Amazon            1
300  Grant Thornton    1
     Microsoft         1
Name: Company Name, dtype: int64

You may also filter after the fact:

res = df.groupby('ID')['Company Name'].value_counts()
res[res.groupby(level=0).transform('count') > 1]

ID   Company Name  
200  Apple             2
     Amazon            1
300  Grant Thornton    1
     Microsoft         1
Name: Company Name, dtype: int64

Upvotes: 1

Related Questions