Reputation: 366
I have the following pandas DataFrame in python:
id | entry_time | other_columns |
---|---|---|
1 | 16:02:04 | other_values |
2 | 15:02:04 | other_values |
3 | 10:32:04 | other_values |
4 | 21:22:44 | other_values |
5 | 09:02:04 | other_values |
6 | 11:02:04 | other_values |
Given the following time intervals, I want to count the rows of the previous dataframe according to their time of entry. In addition, I want to get a column with the percentage (2 decimal places) for the resulting count
column.
time_slot | count | percentage |
---|---|---|
00:00-03:00 | 0 | 0.00 |
03:00-06:00 | 0 | 0.00 |
06:00-09:00 | 0 | 0.00 |
09:00-12:00 | 3 | 50.00 |
12:00-15:00 | 0 | 0.00 |
15:00-18:00 | 2 | 33.33 |
18:00-21:00 | 0 | 0.00 |
21:00-00:00 | 1 | 16.67 |
Upvotes: 0
Views: 32
Reputation: 13247
Example
data = {'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6},
'entry_time': {0: '16:02:04',
1: '15:02:04',
2: '10:32:04',
3: '21:22:44',
4: '09:02:04',
5: '11:02:04'},
'other_columns': {0: 'other_values',
1: 'other_values',
2: 'other_values',
3: 'other_values',
4: 'other_values',
5: 'other_values'}}
df = pd.DataFrame(data)
df
id entry_time other_columns
0 1 16:02:04 other_values
1 2 15:02:04 other_values
2 3 10:32:04 other_values
3 4 21:22:44 other_values
4 5 09:02:04 other_values
5 6 11:02:04 other_values
Code
use pd.cut
bins = range(0, 25, 3)
labels = ['{}:00 - {}:00'.format(str(i).zfill(2), str(i + 3).zfill(2)) for i in bins[:-1]]
s = pd.cut(pd.to_datetime(df['entry_time']).dt.hour, bins=bins, labels=labels, right=False)
pd.concat([s.value_counts(), s.value_counts(normalize=True).mul(100)], keys=['count', 'percentage'], axis=1).reindex(labels).rename_axis('timeslot')
result:
timeslot count percentage
00:00 - 03:00 0 0.0
03:00 - 06:00 0 0.0
06:00 - 09:00 0 0.0
09:00 - 12:00 3 50.0
12:00 - 15:00 0 0.0
15:00 - 18:00 2 33.3
18:00 - 21:00 0 0.0
21:00 - 24:00 1 16.7
other way
s = pd.to_datetime(df['entry_time']).dt.hour.floordiv(3)
df1 = pd.concat([s.value_counts(), s.value_counts(normalize=True).mul(100)], keys=['count', 'percentage'], axis=1).reindex(range(8)).fillna(0)
df1.index = df1.index.map(lambda i:'{}:00 - {}:00'.format(str(i * 3).zfill(2), str(i * 3 + 3).zfill(2)))
df1.rename_axis('timeslot')
result:
count percentage
timeslot
00:00 - 03:00 0.0 0.0
03:00 - 06:00 0.0 0.0
06:00 - 09:00 0.0 0.0
09:00 - 12:00 3.0 50.0
12:00 - 15:00 0.0 0.0
15:00 - 18:00 2.0 33.3
18:00 - 21:00 0.0 0.0
21:00 - 24:00 1.0 16.7
Upvotes: 1