J. P. Petersen
J. P. Petersen

Reputation: 5031

Generating an evenly sampled array from unevenly sampled data in NumPy

The problem is that I want to reduce the amount of data for plots and analysis. I'm using Python and Numpy. The data is unevenly sampled, so there is an array of timestamps and an array of corresponding values. I want it to be at least a certain amount of time between the datapoints. I have a simple solution here written in Python, where the indicies are found where there is at least one second between the samples:

import numpy as np

t = np.array([0, 0.1, 0.2, 0.3, 1.0, 2.0, 4.0, 4.1, 4.3, 5.0 ]) # seconds
v = np.array([0, 0.0, 2.0, 2.0, 2.0, 4.0, 4.0, 5.0, 5.0, 5.0 ])

idx = [0]
last_t = t[0]
min_dif = 1.0 # Minimum distance between samples in time
for i in range(1, len(t)):
    if last_t + min_dif <= t[i]:
        last_t = t[i]
        idx.append(i)

If we look at the result:

--> print idx
[0, 4, 5, 6, 9]

--> print t[idx]
[ 0.  1.  2.  4.  5.]

The question is how can this be done more effectively, especially if the arrays are really long? Are there some built in NumPy or SciPy methods that do something similar?

Upvotes: 7

Views: 4371

Answers (4)

Pierre GM
Pierre GM

Reputation: 20339

While, like @1443118, I'd suggest to use pandas, you may want to try something with np.histogram.

First, get an idea of the number of bins (intervals of min_dif s) you would need:

>>> bins = np.arange(t[0], t[-1]+min_dif, min_dif) - 1e-12

The t[-1]+min_dif is to ensure we take the last point, the -1e-12 a hack to avoid having the 4.0 of your example counted in the last bin: it's just an offset to make sure we close the intervals on the right.

>>> (counts, _) = np.histogram(t, bins)
>>> counts
array([4, 1, 1, 0, 3])
>>> counts.cumsum()
array([4, 5, 6, 6, 9])

So, v[0:4] is your first sample, v[4:5] your second... you get the idea.

Upvotes: 4

reptilicus
reptilicus

Reputation: 10397

I would recommend using pandas for this. It is pretty straightforward to generate regularly spaced time series and then resample data to some specific frequency. See this and look at subsection on resampling about half way down the page.

Upvotes: 1

seberg
seberg

Reputation: 8975

I cannot think of a solution doing exactly what you want, but while it does not seem too elegent to me, this should do approximately what you want without doing interpolation. It will give at most one value (the leftmost) for every second:

# Assuming that t is sorted...
# Create all full seconds.
seconds = np.arange(int(t[0]), int(t[-1]) + 1)

# find the indexes for all
idx = np.searchsorted(t, seconds)
idx = np.unique(idx) # there might be duplicates if a second has no data in it.

For your example it gives the same result, but it will usually allow smaller or larger differences of course (anything between 0 and 2 seconds)...

Upvotes: 1

silvado
silvado

Reputation: 18157

A simple solution would be by interpolation, using e.g. numpy.interp:

vsampled = numpy.interp(numpy.arange(t[0], t[-1]), t, v)

This will not give you the indices of the values though. However, it will generate values by interpolation even for points in t where no data in the input arrays is available.

Upvotes: 3

Related Questions