Reputation: 1309
Suppose I have two arrays indicating the x and y coordinates of a calibration curve.
X = [1,2,3,4,5,6,7,8,9,10,12,14,16,18,20,30,40,50]
Y = [2,4,6,8,10,12,14,16,18,20,24,28,32,36,40,60,80,100]
My example arrays above contain 18 points. You'll notice that the x values are not linearly spaced; there are more points at lower values of x.
Let's suppose I need to reduce the number of points in my calibration curve to 13 points. Obviously, I could just remove the first five or the last five points, but that would shorten my overall range of x values. To maintain range and minimise the space between x values I would preferentially remove values x= 2,4,6,8,10. Removing these x points and their respective y values would leave 13 points in the curve as required.
How could I do this point selection and removal automatically in Python? I.e. Is there an algorithm to pick the best x points from a list, where "best" is defined as keeping the points as close as possible while keeping the overall range and adhering to the new number of points.
Please note that the points remaining must be in the original lists, so I can't interpolate the 18 points on to a 13 point grid.
Upvotes: 4
Views: 1832
Reputation: 15413
This would maximize the square root distances between the chosen points. It in some sense spreads the points as far as possible.
import itertools
list(max(itertools.combinations(sorted(X), 13), i
key=lambda l: sum((a - b) ** 2 for a, b in zip(l, l[1:]))))
Note that this is only feasible for small problems. The time complexity for selecting k points is O(k * (len(X) choose k))
, so basically O(exp(len(X))
. So don't even think about using this for, e.g., len(X) == 100
and k == 10
.
Upvotes: 3
Reputation: 51998
Here is a recursive approach that repeatedly removes the point which will be the least missed:
def mostRedundantPoint(x):
#returns the index, i, in the range 0 < i < len(x) - 1
#that minimizes x[i+1] - x[i-1]
#assumes len(x) > 2 and that x
#is sorted in ascending order
gaps = [x[i+1] - x[i-1] for i in range(1,len(x)-1)]
i = gaps.index(min(gaps))
return i+1
def reduceList(x,k):
if len(x) <= k:
return x
else:
i = mostRedundantPoint(x)
return reduceList(x[:i]+x[i+1:],k)
X = [1,2,3,4,5,6,7,8,9,10,12,14,16,18,20,30,40,50]
print(reduceList(X,13))
#prints [1, 3, 5, 7, 10, 12, 14, 16, 18, 20, 30, 40, 50]
This list essentially agrees with your intended output since 7 vs. 8 have the same net effect. It is reasonably quick in the sense that it is almost instantaneous in reducing sorted([random.randint(1,10**6) for i in range(1000)])
from 1000 elements to 100 elements. The fact that it is recursive implies that it will blow the stack if you try to remove many more points than that, but with what seems to be your intended problem size that shouldn't be an issue. If need be, you could of course replace the recursion by a loop.
Upvotes: 0
Reputation: 36033
X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 30, 40, 50]
Y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 28, 32, 36, 40, 60, 80, 100]
assert len(X) == len(set(X)), "Duplicate X values found"
points = list(zip(X, Y))
points.sort() # sorts by X
while len(points) > 13:
# Find index whose neighbouring X values are closest together
i = min(range(1, len(points) - 1), key=lambda p: points[p + 1][0] - points[p - 1][0])
points.pop(i)
print(points)
Output:
[(1, 2), (3, 6), (5, 10), (7, 14), (10, 20), (12, 24), (14, 28), (16, 32), (18, 36), (20, 40), (30, 60), (40, 80), (50, 100)]
If you want the original series again:
X, Y = zip(*points)
Upvotes: 1
Reputation: 2776
An algorithm that would achieve that:
This would remove 2,4,6,8,10,3,...
Upvotes: 0