Reputation: 817
I have the following df
DataFrame (pandas
):
attribute
2017-01-01 a
2017-01-01 a
2017-01-05 b
2017-02-01 a
2017-02-10 a
where the first column is a non-unique datetime
index and I want to count the number of a's and b's on a weekly basis. If I try to df.attribute.resample('W').count()
there will be an error, because of duplicate entries.
What way can I do that?
Upvotes: 3
Views: 2054
Reputation: 879641
You could use pd.Grouper
to group the index by a weekly frequency:
In [83]: df.groupby(pd.Grouper(freq='W')).count()
Out[83]:
attribute
2017-01-01 2
2017-01-08 1
2017-01-15 0
2017-01-22 0
2017-01-29 0
2017-02-05 1
2017-02-12 1
To group by both a weekly frequency and the attribute
column you could use:
In [87]: df.groupby([pd.Grouper(freq='W'), 'attribute']).size()
Out[87]:
attribute
2017-01-01 a 2
2017-01-08 b 1
2017-02-05 a 1
2017-02-12 a 1
dtype: int64
pd.Grouper
also has a key
parameter which allows you to group by datetimes located in a column rather than the index.
Upvotes: 2
Reputation: 323276
df=df.reset_index()
df.groupby([df['index'].dt.week,'attribute']).count()
Out[292]:
index
index attribute
1 b 1
5 a 1
6 a 1
52 a 2
Or
df.groupby([df.index.get_level_values(0).week,'attribute'])['attribute'].count()
Out[303]:
attribute
1 b 1
5 a 1
6 a 1
52 a 2
Name: attribute, dtype: int64
Upvotes: 3
Reputation: 402543
You might be interested in a 2-step process involving a groupby
followed by a resample
.
df.groupby(level=0).count().resample('W').sum()
attribute
2017-01-01 2.0
2017-01-08 1.0
2017-01-15 NaN
2017-01-22 NaN
2017-01-29 NaN
2017-02-05 1.0
2017-02-12 1.0
Upvotes: 2