Reputation: 592
So I am quite a novice to pandas timeseries but I think I need it for some applications.
I have a dataset of an voltage recording which was sampled at a rate of 2500Hz for an hour it needs to be downsampled to 1500hz.
How can I A) create a datetime index/object for this data and B) downsample it to 1500Hz?
EDIT (here is an example):
original_hz = 1/2500 # 2500 hz
downsample_to_hz = 1/1500 # 1500 hz
# 1 second time index at the two sampling frequencies
time_2500hz = np.arange(0, 1, original_hz)
time_1500hz = np.arange(0, 1, downsample_to_hz)
# example sine wave of recording at 2500hz
amplitude = np.sin(time)
How do I downsample and interpolate amplitude so it lines up with the time index from sampling at 1500hz?
I would like to use pandas timeseries (https://pandas.pydata.org/docs/user_guide/timeseries.html) for this but examples in numpy will also be useful.
EDIT Including this as I feel others may find it useful. There is a scipy function that almost does what I want:https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.decimate.html however the value, q the decimation factor, needs to be an int which does not work for all situation. Quite a shame really don't see why they limited the function like that.
Upvotes: 0
Views: 112
Reputation: 261810
Here is one approach using resample
and resample.interpolate
:
# making a sin wave for the example to be more visual
# not needed in the real case!
amplitude = np.sin(time_2500hz)
# set up DataFrame
df = pd.DataFrame({'signal': amplitude})
# Timedelta index at 2500Hz
df.index = pd.to_timedelta(df.index/2500, unit='s')
# calculate resampling factor
# must be an integer so use microseconds
rate_microsec = 1_000_000//1500
# resample with interpolation
df2 = df.resample(f'{rate_microsec}U').interpolate()
Interpolated output:
signal
0 days 00:00:00 0.000000
0 days 00:00:00.000666 0.000664
0 days 00:00:00.001332 0.001328
0 days 00:00:00.001998 0.001992
0 days 00:00:00.002664 0.002656
... ...
0 days 00:00:00.996336 0.803052
0 days 00:00:00.997002 0.803052
0 days 00:00:00.997668 0.803052
0 days 00:00:00.998334 0.803052
0 days 00:00:00.999000 0.803052
[1501 rows x 1 columns]
2500Hz data before resampling
:
signal
0 days 00:00:00 0.000000
0 days 00:00:00.000400 0.000400
0 days 00:00:00.000800 0.000800
0 days 00:00:00.001200 0.001200
0 days 00:00:00.001600 0.001600
... ...
0 days 00:00:00.998000 0.840389
0 days 00:00:00.998400 0.840605
0 days 00:00:00.998800 0.840822
0 days 00:00:00.999200 0.841038
0 days 00:00:00.999600 0.841255
[2500 rows x 1 columns]
Output of my approach on the example of your other question (8643 rows input) for an output of 5191 rows (NB. the "time" column is also resampled, which might be unwanted but can easily be fixed with .assign(time=lambda d: range(len(d)))
).
time channel1 channel2 channel3
0 days 00:00:00 0.000 0.000000 0.000000 0.000000
0 days 00:00:00.000666 1.665 -0.000011 -0.000044 -0.000022
0 days 00:00:00.001332 3.330 -0.000022 -0.000088 -0.000044
0 days 00:00:00.001998 4.995 -0.000033 -0.000132 -0.000066
0 days 00:00:00.002664 6.660 -0.000044 -0.000176 -0.000088
... ... ... ... ...
0 days 00:00:03.453876 8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.454542 8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.455208 8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.455874 8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.456540 8325.000 -0.054687 -0.218749 -0.109374
[5191 rows x 4 columns]
Upvotes: 1