Reputation: 1262
Is there a way to "compress" an array in python so as to keep the same range but simply decrease the number of elements to a given value?
For example I have an array with 1000 elements and I want to modify it to have 100. Specifically I have a numpy array that is
x = linspace(-1,1,1000)
But because of the way in which I am using it in my project, I can't simply recreate it using linspace as it will not always be in the domain of -1 to 1 and have 1000 elements. These parameters change and I don't have access to them in the function I am defining. So I need a way to compress the array while keeping the -1 to 1 mapping. Think of it as decreasing the "resolution" of the array. Is this possible with any built in functions or different libraries?
Upvotes: 4
Views: 6801
Reputation: 77407
You could pick items at random to reduce any bias you have in the reduction. If the original sample is unordered it would just be:
import random
sample = range(1000)
def reduce(sample, count):
work = sample[:]
random.shuffle(work)
return work[:count]
If order matters, then use enum to track position and reassemble
def reduce(sample, count):
indexed = [item for item in enumerate(sample)]
random.shuffle(indexed)
trimmed = indexed[:count]
trimmed.sort()
return [item for index,item in trimmed]
Upvotes: 1
Reputation: 14118
A simple way to "resample" your array is to group it into chunks, then average each chunk:
(Chunking function is from this answer)
# Chunking function
def chunks(l, n):
for i in xrange(0, len(l), n):
yield l[i:i+n]
# Resampling function
def resample(arr, newLength):
chunkSize = len(arr)/newLength
return [np.mean(chunk) for chunk in chunks(arr, chunkSize)]
# Example:
import numpy as np
x = np.linspace(-1,1,15)
y = resample(x, 5)
print y
# Result:
# [-0.85714285714285721, -0.4285714285714286, -3.7007434154171883e-17, 0.42857142857142844, 0.8571428571428571]
As you can see, the range of the resampled array does drift inward, but this effect would be much smaller for larger arrays.
It's not clear to me whether the arrays will always be generated by numpy.linspace
or not. If so, there are simpler ways of doing this, like simply picking each nth member of the original array, where n is determined by the "compression" ratio:
def linearResample(arr, newLength):
spacing = len(arr) / newLength
return arr[::spacing]
Upvotes: 3