ceiling cat
ceiling cat

Reputation: 5701

How do I count the frequency against a specific list?

I have a DataFrame that looks like this.

                date name
0 2015-06-13 00:21:25    a
1 2015-06-13 01:00:25    b
2 2015-06-13 02:54:48    c
3 2015-06-15 14:38:15    a
4 2015-06-15 15:29:28    b

I want to count the occurrences of dates against a specific date range, including ones that do not appear in the column (and ignores whatever that is in the name column). For example, I might have a date range that looks like this:

periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')

Then, I want an output that looks something like:

date       count    
2015-06-13 3
2015-06-14 0
2015-06-15 2
2015-06-16 0

I haven't been able to find any function that let me keep the 0 rows.

Upvotes: 1

Views: 62

Answers (2)

Alexander
Alexander

Reputation: 109528

This is very similar to the solution of @jezrael, but uses a groupby instead of value_counts:

>>> (pd.DataFrame(df.groupby(df.date.dt.date)['name']
                    .count()
                    .reindex(periods)
                    .fillna(0))
     .rename(columns={'name': 'count'}))
            count
2015-06-13      3
2015-06-14      0
2015-06-15      2
2015-06-16      0

Note: In Pandas 0.18.0 the reindex operation changes the type of count from ints to floats, so if you are using that version you'll need to tack on .astype(int) to the end.

Upvotes: 1

jezrael
jezrael

Reputation: 862481

I think you can first use date from column date for value_counts and then reindex by periods with fillna by 0. Last convert float to int by astype and reset_index:

df = df['date'].dt.date.value_counts()
print df
2015-06-13    3
2015-06-15    2
Name: date, dtype: int64

periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')

df = df.reindex(periods).fillna(0).astype(int).reset_index()
df.columns = ['date','count']
print df
        date  count
0 2015-06-13      3
1 2015-06-14      0
2 2015-06-15      2
3 2015-06-16      0

Upvotes: 2

Related Questions