Python value counts and return the other columns in pandas

Question

I just started doing my first ML project with Python, and got stuck with one issue. My project is to analyze some animal shelter information. Above is the list of animals have been bought to the shelter. I want to see which animals(based on animal ID) have been bought to the shelter more than 3 times, and then return the dataframe with animal ID, breed, MonthYear information etc. My first attempt:

aac[aac['Animal ID'].value_counts()>3]

After this code, Python tells me: IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

My second attempt:

temp=df.groupby('Animal ID').count() 
temp[temp['Name']>3]

Now I get the following output :

Now I do have the list of animal ID who has been returned to shelter more than 3 times. But I miss the other breed/datatime/monthyear ...information. My question is what is the best way to link animal ID and the other information: breed, monthyear, animaltype etc together. So I can further analyze the returned animals.

Thanks a lot for helping me out!

jpp · Accepted Answer

A groupby operation is not required here.

You can create a value_counts series, filter it for items which occur more than 3 times, and then use this to filter your original dataframe.

This is a minimal example:

df = pd.DataFrame({'col1': range(10), 'col2': range(10, 20)},
                  index=['A', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'C', 'B'])

c = df.index.value_counts()
idx = c[c > 3].index

res = df.loc[df.index.isin(idx)]

print(res)

   col1  col2
A     0    10
A     1    11
B     2    12
A     3    13
B     4    14
A     5    15
B     6    16
B     9    19

Python value counts and return the other columns in pandas

Answers (2)

Related Questions