Reputation: 5701
I have a DataFrame
that looks like this.
date name
0 2015-06-13 00:21:25 a
1 2015-06-13 01:00:25 b
2 2015-06-13 02:54:48 c
3 2015-06-15 14:38:15 a
4 2015-06-15 15:29:28 b
I want to count the occurrences of dates against a specific date range, including ones that do not appear in the column (and ignores whatever that is in the name
column). For example, I might have a date range that looks like this:
periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')
Then, I want an output that looks something like:
date count
2015-06-13 3
2015-06-14 0
2015-06-15 2
2015-06-16 0
I haven't been able to find any function that let me keep the 0
rows.
Upvotes: 1
Views: 62
Reputation: 109528
This is very similar to the solution of @jezrael, but uses a groupby instead of value_counts:
>>> (pd.DataFrame(df.groupby(df.date.dt.date)['name']
.count()
.reindex(periods)
.fillna(0))
.rename(columns={'name': 'count'}))
count
2015-06-13 3
2015-06-14 0
2015-06-15 2
2015-06-16 0
Note: In Pandas 0.18.0 the reindex operation changes the type of count from ints to floats, so if you are using that version you'll need to tack on .astype(int)
to the end.
Upvotes: 1
Reputation: 862481
I think you can first use date
from column date
for value_counts
and then reindex
by periods
with fillna
by 0
. Last convert float
to int
by astype
and reset_index
:
df = df['date'].dt.date.value_counts()
print df
2015-06-13 3
2015-06-15 2
Name: date, dtype: int64
periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')
df = df.reindex(periods).fillna(0).astype(int).reset_index()
df.columns = ['date','count']
print df
date count
0 2015-06-13 3
1 2015-06-14 0
2 2015-06-15 2
3 2015-06-16 0
Upvotes: 2