Mark
Mark

Reputation: 1309

"Sensibly" remove points in a Python list

Suppose I have two arrays indicating the x and y coordinates of a calibration curve.

X = [1,2,3,4,5,6,7,8,9,10,12,14,16,18,20,30,40,50]
Y = [2,4,6,8,10,12,14,16,18,20,24,28,32,36,40,60,80,100]

My example arrays above contain 18 points. You'll notice that the x values are not linearly spaced; there are more points at lower values of x.

Let's suppose I need to reduce the number of points in my calibration curve to 13 points. Obviously, I could just remove the first five or the last five points, but that would shorten my overall range of x values. To maintain range and minimise the space between x values I would preferentially remove values x= 2,4,6,8,10. Removing these x points and their respective y values would leave 13 points in the curve as required.

How could I do this point selection and removal automatically in Python? I.e. Is there an algorithm to pick the best x points from a list, where "best" is defined as keeping the points as close as possible while keeping the overall range and adhering to the new number of points.

Please note that the points remaining must be in the original lists, so I can't interpolate the 18 points on to a 13 point grid.

Upvotes: 4

Views: 1832

Answers (4)

Elmar Peise
Elmar Peise

Reputation: 15413

This would maximize the square root distances between the chosen points. It in some sense spreads the points as far as possible.

import itertools
list(max(itertools.combinations(sorted(X), 13), i
         key=lambda l: sum((a - b) ** 2 for a, b in zip(l, l[1:]))))

Note that this is only feasible for small problems. The time complexity for selecting k points is O(k * (len(X) choose k)), so basically O(exp(len(X)). So don't even think about using this for, e.g., len(X) == 100 and k == 10.

Upvotes: 3

John Coleman
John Coleman

Reputation: 51998

Here is a recursive approach that repeatedly removes the point which will be the least missed:

def mostRedundantPoint(x):
    #returns the index, i, in the range 0 < i < len(x) - 1
    #that minimizes x[i+1] - x[i-1]
    #assumes len(x) > 2 and that x
    #is sorted in ascending order

    gaps = [x[i+1] - x[i-1] for i in range(1,len(x)-1)]
    i = gaps.index(min(gaps))
    return i+1

def reduceList(x,k):
    if len(x) <= k:
        return x
    else:
        i = mostRedundantPoint(x)
        return reduceList(x[:i]+x[i+1:],k)

X = [1,2,3,4,5,6,7,8,9,10,12,14,16,18,20,30,40,50]
print(reduceList(X,13))
#prints [1, 3, 5, 7, 10, 12, 14, 16, 18, 20, 30, 40, 50]

This list essentially agrees with your intended output since 7 vs. 8 have the same net effect. It is reasonably quick in the sense that it is almost instantaneous in reducing sorted([random.randint(1,10**6) for i in range(1000)]) from 1000 elements to 100 elements. The fact that it is recursive implies that it will blow the stack if you try to remove many more points than that, but with what seems to be your intended problem size that shouldn't be an issue. If need be, you could of course replace the recursion by a loop.

Upvotes: 0

Alex Hall
Alex Hall

Reputation: 36033

X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 30, 40, 50]
Y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 28, 32, 36, 40, 60, 80, 100]

assert len(X) == len(set(X)), "Duplicate X values found"

points = list(zip(X, Y))
points.sort()  # sorts by X

while len(points) > 13:
    # Find index whose neighbouring X values are closest together
    i = min(range(1, len(points) - 1), key=lambda p: points[p + 1][0] - points[p - 1][0])
    points.pop(i)

print(points)

Output:

[(1, 2), (3, 6), (5, 10), (7, 14), (10, 20), (12, 24), (14, 28), (16, 32), (18, 36), (20, 40), (30, 60), (40, 80), (50, 100)]

If you want the original series again:

X, Y = zip(*points)

Upvotes: 1

Javier
Javier

Reputation: 2776

An algorithm that would achieve that:

  1. Convert each number into the sum of the absolute difference to the number to the left and to the right. If a number is missing, first or last cases, then use MAX_INT. For example, 1 would become MAX_INT; 2 would become 2, 10 would become 3.
  2. Remove the first case with the lowest sum.
  3. If you need to remove more numbers, go to 1.

This would remove 2,4,6,8,10,3,...

Upvotes: 0

Related Questions