Longest increasing unique subsequence

Question

I have a list/array that looks something like this:

[ 0  1  2  3  4  5  6  7  3  9 10 11 13 13 14 15 16 17 18 19  4 16 22  5  3   
  2 10 17 34  5 11 18 27 14 11 15 29  2 11 10 19 32  8 27  1 32  6  2  0]

This list is supposed to be monotonic (strictly increasing). It is not, but you can see that it is mostly increasing. The values that does not fit into this pattern can be considered as noise, and I want them removed. So I want to extract the largest possible subset of this list which will be a strictly increasing sequence of numbers. There are many possible monotonic sequences here, but the point is to find the largest possible one.

It is important that I get the indices of the values to be removed, as I need to know the exact position of the remaining numbers (so instead of removing numbers we can replace them with f.ex. None, nan, or -1).

I can not change the order of any number, just remove the ones that does not fit in.

The remaining list has to be strictly increasing, so if we have f.ex. [11 13 13 14], both of the 13s have to be removed.

If there are several possible solutions that are equally large, we cannot use any of them and must choose a solution with 1 number less. F.ex. in [27 29 30 34 32] we have to throw away both 34 and 32, because we cannot choose one over the other. If we have [27 29 34 15 32] there is no possible solution, because we cannot choose between [27 29], [27 34], [29 34], or [15 32].

The best possible solution to the list presented above would be this:

[ 0  1  2  3  4  5  6  7 -1  9 10 11 -1 -1 14 15 16 17 18 19 -1 -1 22 -1 -1   
 -1 -1 -1 -1 -1 -1 -1 27 -1 -1 -1 29 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]

Can anyone think of an algorithm that would do this specifc job? If you can bring me a part on the way that would also be appreciated.

My only idea so far is a loop for n in range(N, 0, -1): where N is the size of the list. The loop would first try to find solutions of size n=N, and then for n=N-1, n=N-2, etc. When it find exactly 1 solution for a specifc n it stops and returns that solution. I'm not sure what should be inside the loop yet.

UPDATE:

Another SO question provides a Python algorithm for finding the longest subsequence of a list. This is almost what I want to do, but not quite.

I have copied that function (see below) and added a little extra code at the end which changed the ouput if fullsize=True. Then the original sequence with its original shape is rebuilt, but the numbers which are not part of the increasing sequence are replaced by nans. And then I check if any number occurs more than once, and if so, replace all occurences of that number with nans.

The original algorithm must still be changed since it does not provide unique solutions.

For example:

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32, 
    18, 19, 20, 16, 35, 35, 33, 32, 1, 35, 13, 5, 32, 8, 35, 29, 19, 
    35, 19, 28, 32, 18, 31, 13, 3, 32, 33, 35, 31, 0, 21]
print subsequence(a)

gives

[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.  16.  32.  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan]

Instead of ending with .. 16 32 nan .. it should have ended with ... 16 nan ... nan 31 nan nan 32 33 35 nan nan nan], as far as I can see.

Simpler example:

a = [0,1,2,3,4,1,2,3,4,5]
print subsequence(a)

gives

[  0.   1.   2.   3.  nan  nan  nan  nan  nan   5.]

but it should only have given [0 nan ... nan 5] because 1 2 3 4 appears two times and is not unique.

Here comes the current semi-working version of the code (which was used for my example runs):

import numpy as np

def subsequence(seq, fullsize=True):
    """
    Credit:
    http://stackoverflow.com/questions/3992697/longest-increasing-subsequence
    """

    M = [None] * len(seq)    # offset by 1 (j -> j-1)
    P = [None] * len(seq)

    # Since we have at least one element in our list, we can start by
    # knowing that the there's at least an increasing subsequence of length one:
    # the first element.
    L = 1
    M[0] = 0

    # Looping over the sequence starting from the second element
    for i in range(1, len(seq)):
        # Binary search: we want the largest j <= L
        #  such that seq[M[j]] < seq[i] (default j = 0),
        #  hence we want the lower bound at the end of the search process.
        lower = 0
        upper = L

        # Since the binary search will not look at the upper bound value,
        # we'll have to check that manually
        if seq[M[upper-1]] < seq[i]:
            j = upper

        else:
            # actual binary search loop
            while upper - lower > 1:
                mid = (upper + lower) // 2
                if seq[M[mid-1]] < seq[i]:
                    lower = mid
                else:
                    upper = mid

            j = lower    # this will also set the default value to 0

        P[i] = M[j-1]

        if j == L or seq[i] < seq[M[j]]:
            M[j] = i
            L = max(L, j+1)

    # Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
    result = []
    pos = M[L-1]
    for _ in range(L):
        result.append(seq[pos])
        pos = P[pos]

    result = np.array(result[::-1])    # reversing

    if not fullsize:
        return result  # Original return from other SO question.

    # This was written by me, PaulMag:
    # Rebuild original sequence
    subseq = np.zeros(len(seq)) * np.nan
    for a in result:
        for i, b in enumerate(seq):
            if a == b:
                subseq[i] = a
            elif b > a:
                break
        if np.sum(subseq[np.where(subseq == a)].size) > 1:  # Remove duplicates.
            subseq[np.where(subseq == a)] = np.nan

    return subseq  # Alternative return made by me, PaulMag.

Longest increasing unique subsequence

Answers (1)

Related Questions