Reputation: 129
I have dataset (dataset1) that looks as follows:
Date Company Weekday
2015-01-01 Company1 Monday
2015-01-02 Company1 Tuesday
2015-01-03 Company1 Wednesday
2015-01-04 Company1 Thursday
2015-12-09 Company2 Monday
2015-12-10 Company2 Tuesday
………………………………………………………………………
2016-01-08 Company3 Wednesday
2016-01-09 Company3 Thursday
I then apply the following code:
dataset2 = dataset1.groupby(['Company','Weekday']).size().sort_values(ascending=False)
Once the above code has been applied, I get the following results:
Index 0
('Company1', Monday) 80
('Company1', Tuesday) 80
('Company1', Wednesday) 79
………………………………………………………………….
('Company3', Tuesday) 34
I am trying to isolate all dataset2 entries with a count values above 50, but I get all kinds of errors when I try the following:
dataset2=dataset2.loc[dataset2[0]>50]
Can anyone offer an opinion?
Upvotes: 1
Views: 46
Reputation: 863801
Working with Series
, so need:
dataset2 = dataset1.groupby(['Company','Weekday']).size().sort_values(ascending=False)
dataset2 = dataset2[dataset2 > 50]
Another solution is add Series.reset_index
with parameter name
for DataFrame
and then filter by column count
:
dataset2 = (dataset1.groupby(['Company','Weekday'])
.size()
.sort_values(ascending=False)
.reset_index(name='count'))
dataset2 = dataset2[dataset2['count'] > 50]
Upvotes: 3