Reputation: 657
I have some time series data that can be 1Hz, 10Hz, or 100Hz. the file I load in happens to be 1Hz:
In [6]: data = pd.read_csv("ftp.csv")
In [7]: data.Time
Out[7]:
0 NaN
1 11:30:08 AM
2 11:30:09 AM
3 11:30:10 AM
4 11:30:11 AM
5 11:30:12 AM
6 11:30:13 AM
I convert it to datetime with:
In [8]: time = pd.to_datetime(data.Time)
In [9]: time
Out[9]:
0 NaT
1 2015-03-03 11:30:08
2 2015-03-03 11:30:09
3 2015-03-03 11:30:10
4 2015-03-03 11:30:11
5 2015-03-03 11:30:12
From here how can I verify what the sampling frequency is? Do I have to do this manually or can I use a built in pandas method?
Upvotes: 3
Views: 4541
Reputation: 81
I deal in sampled acceleration data on a regular basis. Typically, the data has some sampling rate jitter (the samples are not always at equal delta-t).
Recently, I had to develop a method to determine the "average sampling rate" to verify that the data were obtained at the correct frequency. The typical Pandas methods were not particularly helpful for me and I could not find something directly on-point. This post was the closest I could find.
I did refine the existing answers and added some capability, hopefully you'll find it useful.
My time data are in datetime64 format (converted from a column of strings in a dataframe via the pd.to_datetime() function) and include the complete date and time.
Data = pd.DataFrame(
["2025-01-17 01:07:19.500976776",
"2025-01-17 01:07:19.501953038",
"2025-01-17 01:07:19.502929501",
"2025-01-17 01:07:19.503906163",
"2025-01-17 01:07:19.504882926",
"2025-01-17 01:07:19.505859488",
"2025-01-17 01:07:19.506835851"],
columns = ['Time'],
dtype='datetime64[ns]')
I first convert the Time column to timedelta64 by subtracting out the first value from the remainder, which produces a Pandas Series (not a DataFrame):
rel_time = Data.Time - Data.Time.iloc[0]
rel_time
Out[42]:
0 0 days 00:00:00
1 0 days 00:00:00.000976262
2 0 days 00:00:00.001952725
3 0 days 00:00:00.002929387
4 0 days 00:00:00.003906150
5 0 days 00:00:00.004882712
6 0 days 00:00:00.005859075
Name: Time, dtype: timedelta64[ns]
Then, I use the dt accessor of the Series to get the total seconds for each time. Note that the output type is float64 and the units are implicitly seconds:
rel_time = rel_time.dt.total_seconds()
rel_time
Out[45]:
0 0.000000
1 0.000976
2 0.001953
3 0.002929
4 0.003906
5 0.004883
6 0.005859
Name: Time, dtype: float64
Finally, I'm ready to start investigating frequency in Hz. I use the Series.diff() and Series.describe() to gather the information including a measure of the jitter:
dTime = rel_time.diff().describe()
dTime
Out[49]:
count 6.000000e+00
mean 9.765125e-04
std 1.871371e-07
min 9.762620e-04
25% 9.763880e-04
50% 9.765125e-04
75% 9.766370e-04
max 9.767630e-04
Name: Time, dtype: float64
The average sample rate in Hz is:
1./dTime['mean']
Out[50]: 1024.052431484492
and the standard deviation of the sampling time (in seconds) is: dTime['std']
dTime['std']
Out[51]: 1.8713711550629458e-07
Although this is off-topic, I usually have specifications for the relative SD:
dTime['std']/dTime['mean']
Out[62]: 0.0001916382181552152
The benefits of looking at sampling rate this way are
Note, finally, that I injected some extra jitter into the data to make things more interesting.
Upvotes: 0
Reputation: 394091
One method after converting to datetime64, if frequency sampling rate is the same then we could call diff()
to calculate the difference between all rows which should be the same and compare this with a np.timedelta64
type, so for your sample data this would be:
In [277]:
all(df.datetime.diff()[1:] == np.timedelta64(1, 's')) == True
Out[277]:
True
In [278]:
df.datetime.diff()
Out[278]:
0
1 NaT
2 00:00:01
3 00:00:01
4 00:00:01
5 00:00:01
6 00:00:01
Name: datetime, dtype: timedelta64[ns]
In [279]:
df.datetime.diff()[1:] == np.timedelta64(1, 's')
Out[279]:
0
2 True
3 True
4 True
5 True
6 True
Name: datetime, dtype: bool
to check if the freq was 10hz or 100hz just change the units to np.timedelta64
so for 10hz: np.timedelta64(100, 'ms')
and for 100hz: np.timedelta64(10, 'ms')
The np.timedelta64
units can be found here: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-and-timedelta-arithmetic
Upvotes: 4