Reputation: 23
I've got a dataframe called new_dh of web request that looks like (there are more columns
s-sitename sc-win32-status
date_time
2006-11-01 00:00:00 W3SVC1 0.0
2006-11-01 00:00:00 W3SVC1 0.0
2006-11-01 01:00:00 W3SVC1 0.0
2006-11-01 01:00:00 W3SVC1 0.0
2006-11-01 02:00:00 W3SVC1 0.0
2007-02-28 02:00:00 W3SVC1 0.0
2007-02-28 10:00:00 W3SVC1 0.0
2007-02-28 23:00:00 W3SVC1 0.0
2007-02-28 23:00:00 W3SVC1 0.0
2007-02-28 23:00:00 W3SVC1 0.0
What I would like to do is group by the hours(the actual date of the request does not matter, just the hour and all the times have already been rounded down to not include minutes) for the datetimeindex and instead return
count
hour
0 2
01 2
02 2
10 1
23 3
Any help would be much appreciated.
I have tried
new_dh.groupby([new_dh.index.hour]).count()
but find myself printing many columns of the same value whereas I only want the above version
Upvotes: 1
Views: 52
Reputation: 3629
new_dh['hour'] = new_dh.index.map(lambda x: x.hour)
new_dh.groupby('hour')['hour'].count()
Result
hour
0 2
1 2
2 2
10 1
23 3
Name: hour, dtype: int64
If you need a DataFrame as result:
new_dh.groupby('hour')['hour'].count().rename('count').to_frame()
In this case, the result will be:
count
hour
0 2
1 2
2 2
10 1
23 3
Upvotes: 1
Reputation: 24314
You can also do this by using groupby()
and assign()
method:
If 'date_time' column is not your index:
result=df.assign(hour=df['date_time'].dt.hour).groupby('hour').agg(count=('s-sitename','count'))
If It's your index then use:
result=df.groupby(df.index.hour)['s-sitename'].count().to_frame('count')
result.index.name='hour'
Now if you print result
then you will get your desired output:
count
hour
0 1
1 2
2 2
10 1
23 3
Upvotes: 0
Reputation: 862511
If need DatetimeIndex
in output use DataFrame.resample
:
new_dh.resample('H')['s-sitename'].count()
new_dh.groupby(new_dh.index.floor('H'))['s-sitename'].count()
Problem of your solution is if use GroupBy.count
it count all columns value per Hour
s with exclude missing values, so if no missing values get multiple columns with same values. Possible solution is specify column after groupby
:
new_dh.groupby([new_dh.index.hour])['s-sitename'].count()
So data was changed for see how count with exclude missing values:
print (new_dh)
s-sitename sc-win32-status
date_time
2006-11-01 00:00:00 W3SVC1 0.0
2006-11-01 00:00:00 W3SVC1 0.0
2006-11-01 01:00:00 W3SVC1 0.0
2006-11-01 01:00:00 W3SVC1 0.0
2006-11-01 02:00:00 NaN 0.0
2007-02-28 02:00:00 W3SVC1 0.0
2007-02-28 10:00:00 W3SVC1 0.0
2007-02-28 23:00:00 NaN 0.0
2007-02-28 23:00:00 NaN 0.0
2007-02-28 23:00:00 W3SVC1 0.0
df = new_dh.groupby([new_dh.index.hour]).count()
print (df)
s-sitename sc-win32-status
date_time
0 2 2
1 2 2
2 1 2
10 1 1
23 1 3
So if column is specified:
s = new_dh.groupby([new_dh.index.hour])['s-sitename'].count()
print (s)
date_time
0 2
1 2
2 1
10 1
23 1
Name: s-sitename, dtype: int64
df = new_dh.groupby([new_dh.index.hour])['s-sitename'].count().to_frame()
print (df)
s-sitename
date_time
0 2
1 2
2 1
10 1
23 1
If need count also missing values then use GroupBy.size
:
s = new_dh.groupby([new_dh.index.hour])['s-sitename'].size()
print (s)
date_time
0 2
1 2
2 2
10 1
23 3
Name: s-sitename, dtype: int64
df = new_dh.groupby([new_dh.index.hour])['s-sitename'].size().to_frame()
print (df)
s-sitename
date_time
0 2
1 2
2 2
10 1
23 3
Upvotes: 3