ditrauth
ditrauth

Reputation: 111

How to count event in predefined time ranges

I want to count the events for every 1 second for the CSV data file and draw a histogram according to the results. But I don't understand how I can get the number of events in every second.

My code:

from matplotlib import pyplot as pl
import pandas as pd
import numpy as np

def read_data():
    df = pd.read_csv("test.csv", usecols=['time', 'unix_time', 'name'])
    df['time'] = pd.to_datetime(df['time'])
    df['unix_time'] = (df['unix_time']).astype(int)
    df.info()

    i = 1

    time_counts = df.groupby((3600 * df.time.dt.minute + df.time.dt.second) // i * i)['time'].count()
    print(time_counts)


if __name__ == "__main__":
    read_data()

The output looks strange:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   time       33 non-null     datetime64[ns]
 1   unix_time  33 non-null     int32         
 2   name       33 non-null     object        
dtypes: datetime64[ns](1), int32(1), object(1)
memory usage: 788.0+ bytes

time
18 1
25217 1
43209 1
43219 1
46804 1
54047 1
61241 1
64815 1
64833 1
68402 1
75620 1
79235 1
82806 1
82837 2
86407 1
86446 1
93625 1
97254 1
104446 1
140438 1
144050 1
162025 1
169250 1
180050 1
183623 1
183658 1
194404 1
194412 2
194433 1
194438 1
205219 1
Name: time, dtype: int64

Upvotes: 0

Views: 32

Answers (1)

jezrael
jezrael

Reputation: 862671

Use Grouper by one seconds frequency:

df['time'] = pd.to_datetime(df['time'])

time_counts = df.groupby(pd.Grouper(freq='1s', key='time'))['time'].count()
print(time_counts)
time
2022-12-15 08:00:18    1
2022-12-15 08:00:19    0
2022-12-15 08:00:20    0
2022-12-15 08:00:21    0
2022-12-15 08:00:22    0
                      ..
2022-12-15 08:57:15    0
2022-12-15 08:57:16    0
2022-12-15 08:57:17    0
2022-12-15 08:57:18    0
2022-12-15 08:57:19    1
Freq: S, Name: time, Length: 3422, dtype: int64

Or Series.dt.floor for remove miliseconds:

df['time'] = pd.to_datetime(df['time'])

time_counts = df.groupby(df['time'].dt.floor('S'))['time'].count()

print(time_counts)
time
2022-12-15 08:00:18    1
2022-12-15 08:07:17    1
2022-12-15 08:12:09    1
2022-12-15 08:12:19    1
2022-12-15 08:13:04    1
2022-12-15 08:15:47    1
2022-12-15 08:17:41    1
2022-12-15 08:18:15    1
2022-12-15 08:18:33    1
2022-12-15 08:19:02    1
2022-12-15 08:21:20    1
2022-12-15 08:22:35    1
2022-12-15 08:23:06    1
2022-12-15 08:23:37    2
2022-12-15 08:24:07    1
2022-12-15 08:24:46    1
2022-12-15 08:26:25    1
2022-12-15 08:27:54    1
2022-12-15 08:29:46    1
2022-12-15 08:39:38    1
2022-12-15 08:40:50    1
2022-12-15 08:45:25    1
2022-12-15 08:47:50    1
2022-12-15 08:50:50    1
2022-12-15 08:51:23    1
2022-12-15 08:51:58    1
2022-12-15 08:54:04    1
2022-12-15 08:54:12    2
2022-12-15 08:54:33    1
2022-12-15 08:54:38    1
2022-12-15 08:57:19    1
Name: time, dtype: int64

Upvotes: 3

Related Questions