Reputation: 253

finding how many times a sequence repeats in a data frame using python

Is there a way to find how many times a sequence repeats in a dataframe?

Lets say I have a dataframe with a large number of 1 and 3's and I wanted to see how much this sequence [3,1,3,3,1] repeats.

here's an example list. 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3

Here's an example of what I'm trying to do

this first part would be true 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3

this second part would be false 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3

and the third part would be false 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3

I want to analyze sections at a time according to the length of the sequence I'm trying to find. In numeric order of the data frame.

My data Is in a dateandtime format. But I can change that.

Thanks for all your help I really appreciate it everything everybody does on this site.

Upvotes: 3

Answers (3)

Prince

Reputation: 441

Step1

Convert list of integers into string.

Step2

Use findall() function of regex module to find all occurences of target_string in my_list_string.

import re
my_list = [3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3]
target = [3, 1, 3, 3, 1]

my_list_string = ''.join(str(e) for e in my_list)
target_string = ''.join(str(e) for e in target)

print(len(re.findall(target_string, my_list_string)))

Upvotes: 0

Alexander

Reputation: 109546

This converts a list of numbers into a comma separated string, and then compares each sequential chunk to the target.

from itertools import izip_longest

my_list = [3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3]
target = [3, 1, 3, 3, 1]
n = len(target)
>>> sum(all(a == b for a, b in izip_longest(target, my_list[(i * n):((i + 1) * n)])) 
        for i in range(len(my_list) // n))
1

Below is an alternative method that converts the integers to strings and then compares the strings.

target = ",".join(str(number) for number in target)
>>> target
'3,1,3,3,1'
>>> sum(",".join(str(number) for number in my_list[(i * n):(i * n + n)]) == target 
        for i in range(len(my_list) / n))
1

To give some more intuition on what is going on, the list is chunked five elements at a time and then those elements are joined as a string. These strings are then compared to the target string which was similarly converted, and the number of matches are then summed.

>>> [",".join(str(number) for number in my_list[(i * n):(i * n + n)]) 
     for i in range(len(my_list) / n)]
['3,1,3,3,1', '3,3,1,3,3', '1,3,1,1,1', '1,3,1,3,1']

Upvotes: 0

Eelco Hoogendoorn

Reputation: 10759

my_list = np.array([3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3])
target = np.array([3, 1, 3, 3, 1])
(my_list.reshape(-1, len(sequence)) == sequence[None, :]).all(axis=1)

Upvotes: 2

finding how many times a sequence repeats in a data frame using python

Answers (3)

Step1

Step2

Related Questions