Reputation: 1142
I have a dataframe that has a datetime index. I would like to add a column that holds the count of rows for the day.
dff = pd.DataFrame(['red','red','blue'],
columns = ['colors'],
index = [pd.Timestamp('2019-09-19 14:03:20'),pd.Timestamp('2019-09-19 17:03:20'),pd.Timestamp('2019-09-20 14:03:20')])
colors
2019-09-19 14:03:20 red
2019-09-19 17:03:20 red
2019-09-20 14:03:20 blue
So rows happening on 2019-09-19 should have a 'count' column of 2 and the last row a count column of 1.
Upvotes: 0
Views: 875
Reputation: 2546
This temporarily creates a column with just the date, then counts said column and puts it into a new column called counts in the real data frame.
dff["counts"] = dff.assign(date_col = lambda x: x.index.date).groupby(['date_col']).transform('count')
Here's the whole thing to paste into an IDE and test:
import pandas as pd
dff = pd.DataFrame(['red','red','blue'],
columns = ['colors'],
index = [pd.Timestamp('2019-09-19 14:03:20'),pd.Timestamp('2019-09-19 17:03:20'),pd.Timestamp('2019-09-20 14:03:20')])
dff["counts"] = dff.assign(date_col = lambda x: x.index.date).groupby(['date_col']).transform('count')
print(dff)
And the result:
colors counts
2019-09-19 14:03:20 red 2
2019-09-19 17:03:20 red 2
2019-09-20 14:03:20 blue 1
Upvotes: 2