stephenb
stephenb

Reputation: 1142

Pandas column for count by date

I have a dataframe that has a datetime index. I would like to add a column that holds the count of rows for the day.

dff = pd.DataFrame(['red','red','blue'],
    columns = ['colors'],
    index = [pd.Timestamp('2019-09-19 14:03:20'),pd.Timestamp('2019-09-19 17:03:20'),pd.Timestamp('2019-09-20 14:03:20')])

                    colors
2019-09-19 14:03:20 red
2019-09-19 17:03:20 red
2019-09-20 14:03:20 blue

So rows happening on 2019-09-19 should have a 'count' column of 2 and the last row a count column of 1.

Upvotes: 0

Views: 875

Answers (1)

Evan
Evan

Reputation: 2546

This temporarily creates a column with just the date, then counts said column and puts it into a new column called counts in the real data frame.

dff["counts"] = dff.assign(date_col = lambda x: x.index.date).groupby(['date_col']).transform('count')

Here's the whole thing to paste into an IDE and test:

import pandas as pd

dff = pd.DataFrame(['red','red','blue'],
    columns = ['colors'],
    index = [pd.Timestamp('2019-09-19 14:03:20'),pd.Timestamp('2019-09-19 17:03:20'),pd.Timestamp('2019-09-20 14:03:20')])

dff["counts"] = dff.assign(date_col = lambda x: x.index.date).groupby(['date_col']).transform('count')

print(dff)

And the result:

                        colors  counts
2019-09-19 14:03:20    red       2
2019-09-19 17:03:20    red       2
2019-09-20 14:03:20   blue       1

Upvotes: 2

Related Questions