Angus Campbell
Angus Campbell

Reputation: 592

Creating a pandas time series datetime with sub-milisecond datetime and downsampling it

So I am quite a novice to pandas timeseries but I think I need it for some applications.

I have a dataset of an voltage recording which was sampled at a rate of 2500Hz for an hour it needs to be downsampled to 1500hz.

How can I A) create a datetime index/object for this data and B) downsample it to 1500Hz?

EDIT (here is an example):

original_hz = 1/2500 # 2500 hz
downsample_to_hz = 1/1500 # 1500 hz

# 1 second time index at the two sampling frequencies
time_2500hz = np.arange(0, 1, original_hz) 
time_1500hz = np.arange(0, 1, downsample_to_hz)

# example sine wave of recording at 2500hz
amplitude   = np.sin(time)

How do I downsample and interpolate amplitude so it lines up with the time index from sampling at 1500hz?

I would like to use pandas timeseries (https://pandas.pydata.org/docs/user_guide/timeseries.html) for this but examples in numpy will also be useful.

EDIT Including this as I feel others may find it useful. There is a scipy function that almost does what I want:https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.decimate.html however the value, q the decimation factor, needs to be an int which does not work for all situation. Quite a shame really don't see why they limited the function like that.

Upvotes: 0

Views: 112

Answers (1)

mozway
mozway

Reputation: 261810

Here is one approach using resample and resample.interpolate:

# making a sin wave for the example to be more visual
# not needed in the real case!
amplitude   = np.sin(time_2500hz)

# set up DataFrame 
df = pd.DataFrame({'signal': amplitude})
# Timedelta index at 2500Hz
df.index = pd.to_timedelta(df.index/2500, unit='s')

# calculate resampling factor
# must be an integer so use microseconds
rate_microsec = 1_000_000//1500

# resample with interpolation
df2 = df.resample(f'{rate_microsec}U').interpolate()

Interpolated output:

                          signal
0 days 00:00:00         0.000000
0 days 00:00:00.000666  0.000664
0 days 00:00:00.001332  0.001328
0 days 00:00:00.001998  0.001992
0 days 00:00:00.002664  0.002656
...                          ...
0 days 00:00:00.996336  0.803052
0 days 00:00:00.997002  0.803052
0 days 00:00:00.997668  0.803052
0 days 00:00:00.998334  0.803052
0 days 00:00:00.999000  0.803052

[1501 rows x 1 columns]

2500Hz data before resampling:

                          signal
0 days 00:00:00         0.000000
0 days 00:00:00.000400  0.000400
0 days 00:00:00.000800  0.000800
0 days 00:00:00.001200  0.001200
0 days 00:00:00.001600  0.001600
...                          ...
0 days 00:00:00.998000  0.840389
0 days 00:00:00.998400  0.840605
0 days 00:00:00.998800  0.840822
0 days 00:00:00.999200  0.841038
0 days 00:00:00.999600  0.841255

[2500 rows x 1 columns]

addendum

Output of my approach on the example of your other question (8643 rows input) for an output of 5191 rows (NB. the "time" column is also resampled, which might be unwanted but can easily be fixed with .assign(time=lambda d: range(len(d)))).

                            time  channel1  channel2  channel3
0 days 00:00:00            0.000  0.000000  0.000000  0.000000
0 days 00:00:00.000666     1.665 -0.000011 -0.000044 -0.000022
0 days 00:00:00.001332     3.330 -0.000022 -0.000088 -0.000044
0 days 00:00:00.001998     4.995 -0.000033 -0.000132 -0.000066
0 days 00:00:00.002664     6.660 -0.000044 -0.000176 -0.000088
...                          ...       ...       ...       ...
0 days 00:00:03.453876  8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.454542  8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.455208  8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.455874  8325.000 -0.054687 -0.218749 -0.109374
0 days 00:00:03.456540  8325.000 -0.054687 -0.218749 -0.109374

[5191 rows x 4 columns]

Upvotes: 1

Related Questions