useruser123
useruser123

Reputation: 89

Group data into bins of 30 minutes

I have a .csv file with some data. There is only one column of in this file, which includes timestamps. I need to organize that data into bins of 30 minutes. This is what my data looks like:

Timestamp
04/01/2019 11:03
05/01/2019 16:30
06/01/2019 13:19
08/01/2019 13:53
09/01/2019 13:43

So in this case, the last two data points would be grouped together in the bin that includes all the data from 13:30 to 14:00.

This is what I have already tried

df = pd.read_csv('book.csv')
df['Timestamp'] = pd.to_datetime(df.Timestamp)
df.groupby(pd.Grouper(key='Timestamp', 
freq='30min')).count().dropna()

I am getting around 7000 rows showing all hours for all days with the count next to them, like this:

2019-09-01 03:00:00   0
2019-09-01 03:30:00   0
2019-09-01 04:00:00   0
 ...

I want to create bins for only the hours that I have in my dataset. I want to see something like this:

Time         Count

11:00:00      1

13:00:00      1

13:30:00      2 (we have two data points in this interval)

16:30:00      1

Thanks in advance!

Upvotes: 1

Views: 1105

Answers (1)

Space Impact
Space Impact

Reputation: 13255

Use groupby.size as:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.Timestamp.dt.floor('30min').dt.time.to_frame()\
                 .groupby('Timestamp').size()\
                 .reset_index(name='Count')

Or as per suggestion by jpp:

df = df.Timestamp.dt.floor('30min').dt.time.value_counts().reset_index(name='Count')


print(df)
  Timestamp  Count
0  11:00:00      1
1  13:00:00      1
2  13:30:00      2
3  16:30:00      1

Upvotes: 3

Related Questions