Most efficient way to find longest incrementing subsequence in a list of lists

Question

I am doing some signal analysis, a part of which is to find the longest subsequence

I have dictionary like the following:

sequenceDict = {
    0: [168, 360, 470],
    1: [279, 361, 471, 633, 729, 817],
    2: [32, 168, 170, 350, 634, 730, 818],
    3: [33, 155, 171, 363, 635, 731, 765, 819],
    4: [352, 364, 732, 766, 822],
    5: [157, 173, 353, 577, 637, 733, 823, 969],
    6: [158, 174, 578, 638, 706, 734, 824],
    7: [159, 175, 579, 707, 735],
    8: [160, 464, 640, 708, 826],
    9: [173, 709, 757, 827],
    10: [174, 540, 642, 666, 710],
    11: [253, 667, 711],
    12: [254, 304, 668],
    13: [181, 255, 831],
    14: [256, 340, 646, 832],
    16: [184, 416], 
    17: [417], 
    18: [418], 
    19: [875], 
    20: [876], 
    23: [217], 
    24: [168, 218, 880], 
    25: [219, 765, 881], 
    26: [220, 766], 
    27: [221], 
    28: [768], 
    29: [3, 769], 
    30: [344, 476, 706]}

These are essentially always sorted indices of another array, I would like to find the longest incrementing sequence ( just like a longest increasing subsequence), by picking only one number from each key sequentially ( key 2 comes right after key 1 and so on), for example, from keys 0 and 1, [360, 361] is one sequence, and [470, 471] is another. I am calling these incrementing sequence, as these numbers should strictly increase by 1.

I have looked at things like patience sorting and so on, but since this problem is slightly different, and also has a tree of sequences, are there any known python implementations, or other efficient ways to do this except generating all possible sequences from this dict and then running patience sort?

Sait · Accepted Answer

In contrast to @6502's solution, this one not only keeping the best solution but also keeps track of every incrementing subsequences, if this is more helpful.

The idea is a similar to the sliding window approach. You start from the first list, update currentHotItems and globalHotItems dictionaries and then see the second list and update the dictionaries again, etc.

# fill missing indexes in the dictionary:
for i in range(min(sequenceDict), max(sequenceDict)):
    if i not in sequenceDict:
        sequenceDict[i] = []

# get only lists, ordered:
sortedItems = map(lambda x:x[1], sorted(sequenceDict.items(), key=lambda x:x[0]))    
globalHotItems = {} # (value, startIndex): length
currentHotItems = {} # value: length

for i in range(len(sortedItems)):
    updatedHotItems = {} # updated value: length
    for item in sortedItems[i]:
        if (item - 1) in currentHotItems:
            updatedHotItems[item] = currentHotItems[item-1] + 1
        else:
            updatedHotItems[item] = 1

    deadSet = set(currentHotItems.keys()) - \
            set(updatedHotItems.keys() + [key - 1 for key in updatedHotItems.keys()])

    for item in deadSet:
        globalHotItems[ (item-currentHotItems[item]+1, i-currentHotItems[item]) ] = currentHotItems[item]

    currentHotItems = updatedHotItems

print sorted(globalHotItems.items(), key=lambda x:x[1])[-1]

globalHotItems is the dictionary which contains the result. Keys are (value, startIndex) and Value's are the length.

For example, the last 4 items in globalHotItems:

print sorted(globalHotItems.items(), key=lambda x:x[1])[-4:]

is:

[((157, 5), 4), ((217, 23), 5), ((706, 6), 6), ((729, 1), 7)]

it means the best solution is length 7 and starts in the index=1 list as 729. And the best second solution is length 6 and start in index=6 list as 706, etc.

Complexity:

I think complexity should be again: O(input_size × average_number_of_sequences)

Most efficient way to find longest incrementing subsequence in a list of lists

Answers (2)

Related Questions