Reputation: 53
I just started doing my first ML project with Python, and got stuck with one issue. My project is to analyze some animal shelter information. Above is the list of animals have been bought to the shelter. I want to see which animals(based on animal ID) have been bought to the shelter more than 3 times, and then return the dataframe with animal ID, breed, MonthYear information etc. My first attempt:
aac[aac['Animal ID'].value_counts()>3]
After this code, Python tells me: IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
My second attempt:
temp=df.groupby('Animal ID').count()
temp[temp['Name']>3]
Now I get the following output :
Now I do have the list of animal ID who has been returned to shelter more than 3 times. But I miss the other breed/datatime/monthyear ...information. My question is what is the best way to link animal ID and the other information: breed, monthyear, animaltype etc together. So I can further analyze the returned animals.
Thanks a lot for helping me out!
Upvotes: 2
Views: 2155
Reputation: 1437
You could create another column with counts, and use it for filtering.
df = pandas.DataFrame({'col1': range(10), 'col2': range(10, 20), 'col3': ['A', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'C', 'B']})
df = df.merge(df.groupby('col3').apply(len).to_frame(name='counts'), left_on='col3', right_index=True)
df[df.counts > 2]
Upvotes: 0
Reputation: 164693
A groupby
operation is not required here.
You can create a value_counts
series, filter it for items which occur more than 3 times, and then use this to filter your original dataframe.
This is a minimal example:
df = pd.DataFrame({'col1': range(10), 'col2': range(10, 20)},
index=['A', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'C', 'B'])
c = df.index.value_counts()
idx = c[c > 3].index
res = df.loc[df.index.isin(idx)]
print(res)
col1 col2
A 0 10
A 1 11
B 2 12
A 3 13
B 4 14
A 5 15
B 6 16
B 9 19
Upvotes: 3