Reputation: 3
I have a dataframe df
with a list of datetime values and colours occurring each day that looks like this:
Date Info1
2020-02-03 12:03:22 RED
2020-02-03 16:11:21 RED
2020-02-03 09:00:00 GRN
2020-02-04 01:00:23 GRN
2020-02-05 06:12:55 RED
2020-02-05 20:14:00 GRN
2020-02-05 12:00:00 YLW
2020-02-05 22:01:12 RED
2020-02-05 11:11:09 GRN
2020-02-05 23:11:45 BLU
2020-02-06 12:00:00 BLU
2020-02-06 08:27:11 GRN
I am trying to create a DataFrame counting how many of each colour I have per day (regardless of the time), so the result should look like this: (If a day is missing I don't need to add it)
Date RED GRN YLW BLU
2020-02-03 2 1 0 0
2020-02-04 0 1 0 0
2020-02-05 1 2 1 1
2020-02-06 0 1 0 0
I have listed all the existing colours in Info1 with
colours = df.Info1.unique()
I was thinking to cycle each element of this list and use it as a condition with a count(), but I can't figure out how.
So far I managed to count every non-null value of each day with
result=df.groupby(pd.Grouper(key='Date',freq='D'))['Info1'].count()
but I can't figure out how to specify a condition to count only a specific colour. Could someone point me in the right direction?
Upvotes: 0
Views: 66
Reputation: 28649
as rightly pointed out by @datanovice, crosstab can solve this for you
#read in data
df = pd.read_clipboard(sep='\s{2,}', parse_dates=['Date'])
pd.crosstab(df.Date.dt.date, df.Info1)
Info1 BLU GRN RED YLW
Date
2020-02-03 0 1 2 0
2020-02-04 0 1 0 0
2020-02-05 1 2 2 1
2020-02-06 1 1 0 0
Upvotes: 2
Reputation: 360
Look into the size
method. Here's something that works:
(
df.groupby(['Date', 'Info1'])
.size()
.reset_index()
.rename(columns={0:'count'})
.pivot(index='Date', columns='Info1', values='count')
.reset_index()
.fillna(0)
)
Upvotes: 0