Reputation: 5031
The problem is that I want to reduce the amount of data for plots and analysis. I'm using Python and Numpy. The data is unevenly sampled, so there is an array of timestamps and an array of corresponding values. I want it to be at least a certain amount of time between the datapoints. I have a simple solution here written in Python, where the indicies are found where there is at least one second between the samples:
import numpy as np
t = np.array([0, 0.1, 0.2, 0.3, 1.0, 2.0, 4.0, 4.1, 4.3, 5.0 ]) # seconds
v = np.array([0, 0.0, 2.0, 2.0, 2.0, 4.0, 4.0, 5.0, 5.0, 5.0 ])
idx = [0]
last_t = t[0]
min_dif = 1.0 # Minimum distance between samples in time
for i in range(1, len(t)):
if last_t + min_dif <= t[i]:
last_t = t[i]
idx.append(i)
If we look at the result:
--> print idx
[0, 4, 5, 6, 9]
--> print t[idx]
[ 0. 1. 2. 4. 5.]
The question is how can this be done more effectively, especially if the arrays are really long? Are there some built in NumPy or SciPy methods that do something similar?
Upvotes: 7
Views: 4371
Reputation: 20339
While, like @1443118, I'd suggest to use pandas
, you may want to try something with np.histogram
.
First, get an idea of the number of bins (intervals of min_dif
s) you would need:
>>> bins = np.arange(t[0], t[-1]+min_dif, min_dif) - 1e-12
The t[-1]+min_dif
is to ensure we take the last point, the -1e-12
a hack to avoid having the 4.0
of your example counted in the last bin: it's just an offset to make sure we close the intervals on the right.
>>> (counts, _) = np.histogram(t, bins)
>>> counts
array([4, 1, 1, 0, 3])
>>> counts.cumsum()
array([4, 5, 6, 6, 9])
So, v[0:4]
is your first sample, v[4:5]
your second... you get the idea.
Upvotes: 4
Reputation: 10397
I would recommend using pandas for this. It is pretty straightforward to generate regularly spaced time series and then resample data to some specific frequency. See this and look at subsection on resampling about half way down the page.
Upvotes: 1
Reputation: 8975
I cannot think of a solution doing exactly what you want, but while it does not seem too elegent to me, this should do approximately what you want without doing interpolation. It will give at most one value (the leftmost) for every second:
# Assuming that t is sorted...
# Create all full seconds.
seconds = np.arange(int(t[0]), int(t[-1]) + 1)
# find the indexes for all
idx = np.searchsorted(t, seconds)
idx = np.unique(idx) # there might be duplicates if a second has no data in it.
For your example it gives the same result, but it will usually allow smaller or larger differences of course (anything between 0 and 2 seconds)...
Upvotes: 1
Reputation: 18157
A simple solution would be by interpolation, using e.g. numpy.interp
:
vsampled = numpy.interp(numpy.arange(t[0], t[-1]), t, v)
This will not give you the indices of the values though. However, it will generate values by interpolation even for points in t where no data in the input arrays is available.
Upvotes: 3