tetter
tetter

Reputation: 9

Python finding reoccuring values in list (backtesting)

and thanks in advance.

my problem is the following: I wanna analyse a dataframe (list) consisting of only e.g. "x" and "y". only if "x" is given in three consecutive indizes I want to get a statement that gives me the index of the third time when value = x, not the fourth or n time and then it should repeat this loop for the whole list, giving me the indizes for all the times when "x" occured in three consecutive indizes

>  0 = y  
1 = x  
2 = y  
3 = x  
4 = x  
5 = x  
6 = x  
7 = y  
8 = x  
9 = x
10 = x

and so on

desired result

print (i)

     - 5 , 10

Upvotes: 0

Views: 80

Answers (2)

Thierry Lathuille
Thierry Lathuille

Reputation: 24281

A basic way to do it is to count the target values we see in a row, and to keep the indices when we have the exact number of values we expect:

def find_nth(data, target, n):
    out = []
    targets_in_a_row = 0
    for index, value in enumerate(data):
        if value != target:
            targets_in_a_row = 0
        else:
            targets_in_a_row += 1
            if targets_in_a_row == n:
                out.append(index)
    return out

data = ['y', 'x', 'y', 'x', 'x', 'x', 'x', 'y', 'x', 'x', 'x']
print(find_nth(data, 'x', 3))
# [5, 10]

Another way (easily adaptable to find a more complicated pattern but less efficient in this case) would be to use a collection.deque with a max length of n to keep the last n values we've seen. We can then easily check if all of them are equal to the target.

We just need a flag (matched) that we set once we have n target values in a row and reset only when we get a different one.

from collections import deque

def find_nth(data, target, n):
    d = deque(maxlen = n)
    out = []
    matched = False

    for index, value in enumerate(data):
        d.append(value)
        if value != target:
            matched = False
        elif not matched and all(val == target for val in d):
            out.append(index)
            matched = True
    return out


data = ['y', 'x', 'y', 'x', 'x', 'x', 'x', 'y', 'x', 'x', 'x']
print(find_nth(data, 'x', 3))
# [5, 10]

Upvotes: 1

Megha Krishna
Megha Krishna

Reputation: 9

An easier way to implement the use case is:

  1. Find all the occurrence(indexes) of 'x' in data(as a list)
  2. Iterate through these index and check if the values are consecutive by finding the difference to be 1
  3. If true keep a count to mark the 3rd occurrence and print it as output/add it to a final output list. Also check if the element is already part of output to skip the next consecutive check(eg 4,5,6,7,8 -> skip 7-6 as 6 is already the 3rd occ).
 def third_occ(self):
        """
        First find all the occurrence of x in data -> all_occ_x
        iterate through the occurrence index and check if they are consecutive by findind the diff between each indx  
        append index value to third_occ if second pair of difference is 1     
        :return: list : third_occ
        """
        # element index for reference
        # ex = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15']
        data = ['x', 'x', 'x', 'x', 'y', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'y', 'x', 'x', 'x']
        all_occ_x = [i for i, x in enumerate(data) if x == "x"]  # all occurrence of x in data list
        count = 0
        third_occ = []
        for n1, n2 in zip(all_occ_x[:-1], all_occ_x[1:]):
            if n2 - n1 == 1 and n1 not in third_occ:
                count += 1
                if count == 2:
                    third_occ.append(n2)
                    count = 0
            else:
                count = 0
        return third_occ

Upvotes: 0

Related Questions