Reputation: 1192
i'm trying to reshape a array in Python and fill it with mean values. Example:
More advanced: I've got an array with e.g 1000 samples. But I know it should be 1300 samples long. How to scale the the array to the new length and fill it well distributed with mean values? A solution with interpolation could make me happy too
Edit: I was questioned for an example what i mean with well distributed values. E.g: a sensor should deliver data with 100Hz. But sometimes the sensor is not able to provide the full sampling frequency. Instead of getting 1300 samples in 13 seconds i get a random amount between 900 and 1300 samples. I don't know when a value is missing. I want to distribute the missing values uniformly over the whole array and assign them a meaningful value.
Thank you
Upvotes: 2
Views: 289
Reputation: 1192
I've written a solution which is even better for me. I had some problems with floating errors on large arrays. To correct those i inserted some missing ones randomly. Maybe someone knows how to avoid this I'm sure the code is very optimizable feel free to do this.
import numpy as np
def resizeArray(data, newLength):
datalength = len(data)
if (datalength == newLength): return data
appendIndices = []
appendNow = 0
step = newLength / datalength
increase = step % 1
for i in np.arange(0, datalength-2, step):
appendNow += increase
if appendNow >= 1:
appendIndices.append(round(i,0))
appendNow = appendNow % 1
#still missing values due to floating errors?
diff = newLength - datalength - len(appendIndices)
if diff > 0:
for i in range(0, diff):
appendIndices.append(np.random.randint(1, datalength - 2))
#insert average at the specified indizes
appendVals = [(data[i] + data[i+1]) / 2 for i in appendIndices]
a = np.insert(data, appendIndices, appendVals)
return a
Upvotes: 0
Reputation: 221664
You can use a differentiation trick here with np.diff
. Thus, assuming A
as the input array, you can do -
out = np.empty(2*A.size-1)
out[0::2] = A
out[1::2] = (np.diff(A) + 2*A[:-1]).astype(float)/2 # Interpolated values
The trick here is that the differentiation between two consecutive elements when added with twice of the previous element would be the mean value between those two elements. We just use this trick throughout the extent of the input 1D array to get our desired interpolated array.
Sample run -
In [34]: A
Out[34]: array([ 2, 3, -20, 10, 4])
In [35]: out = np.empty(2*A.size-1)
...: out[0::2] = A
...: out[1::2] = (np.diff(A) + 2*A[:-1]).astype(float)/2
...:
In [36]: out
Out[36]: array([ 2. , 2.5, 3. , -8.5, -20. , -5. , 10. , 7. , 4. ])
I think @thomas's solution
would be the go-to approach here as we are basically doing interpolation with a specific case in mind. But since, I am mostly interested in the performance of codes, here's a runtime test comparing these two solutions -
In [62]: def interp_based(A): # @thomas's solution
...: new_length = 2*A.size-1
...: return np.interp(np.linspace(0,len(A)-1,new_length),range(len(A)),A)
...:
...: def diff_based(A):
...: out = np.empty(2*A.size-1)
...: out[0::2] = A
...: out[1::2] = (np.diff(A) + 2*A[:-1]).astype(float)/2
...: return out
...:
In [63]: A = np.random.randint(0,10000,(10000))
In [64]: %timeit interp_based(A)
1000 loops, best of 3: 932 µs per loop
In [65]: %timeit diff_based(A)
10000 loops, best of 3: 148 µs per loop
Upvotes: 1
Reputation: 1813
It depends what you mean by well distributed values. Assuming your values lie on an evenly spaced grid the following solution using interpolation could make sense:
>>> import numpy as np
>>> new_length = 9
>>> b = np.interp(np.linspace(0,len(a)-1,new_length),range(len(a)),a)
>>> b
array([ 2. , 2.5, 3. , -8.5, -20. , -5. , 10. , 7. , 4. ])
This will also work if len(a)=1000
and new_length=1300
.
Upvotes: 3