molonepa
molonepa

Reputation: 63

Python - measure longest subsequence of a certain value in a list with tolerance for error

Given a list of integers [0,0,0,1,1,0,2,2,0,1,0,0,2,1,2,2,2,2,1,...], I need to calculate the longest subsequence of elements of which x% of the elements are n, i.e. there is a tolerance whereby a subsequence which contains less than 1 - x% values other than n is still counted as an unbroken subsequence of n's.

I have used the following one-liner to get the longest subsequence where all values are n, but I don't know where to go from here:

longest_subsequence_0 = max((len(l) for n, l in itertools.groupby(list) if n == 0))

If anyone could steer me in the right direction it would be much appreciated :D

Upvotes: 2

Views: 271

Answers (2)

dantarno
dantarno

Reputation: 92

Just a sliding window changing size and computing the %. If above threshold, record the window size if this one is larger than the previous stored one.

def longest_sub(list, n, threshold):
    largest_window = 0
    for i in range(len(list)+1): # from i 
        for j in range(i+1,len(list)+1): # to j
            window_len = len(list[i: j]) # store window size
            if window_len > largest_window: # if inspected window > largest found yet
                if list[i:j].count(n)/window_len*100 > threshold: # if percentage above threshold
                    largest_window = window_len # new largest_window
    return largest_window

longest_sub([0,0,0,1,1,0,2,2,0,1,0,0,2,1,2,2,2,2,1], 0, 30) # 9 
longest_sub([0,0,0,1,1,0,2,2,0,1,0,0,2,1,2,2,2,2,1], 0, 80) # 3

Upvotes: 1

Albert James Teddy
Albert James Teddy

Reputation: 1375


I have this


import numpy as np

MAX = 10
a = 5
LEN = 100
threshold = 0.3
tolerance = 0.05

x = list(np.random.choice(list(range(MAX)), size=LEN))

window = LEN

largest = 0
while window > 1 and largest == 0:
    for i in range(0, LEN-window+1):
        ratio_of_a = x[i:i+window].count(a)/window
        if ratio_of_a > threshold - tolerance:
            largest = window

    window -= 1

print(f"Largest sequence found {largest}")

Pretty brute force, i used a instead of n also sorry for the confusion :)

Upvotes: 1

Related Questions