Reputation: 253
Is there a way to find how many times a sequence repeats in a dataframe?
Lets say I have a dataframe with a large number of 1 and 3's and I wanted to see how much this sequence [3,1,3,3,1] repeats.
here's an example list. 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
Here's an example of what I'm trying to do
this first part would be true 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
this second part would be false 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
and the third part would be false 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
I want to analyze sections at a time according to the length of the sequence I'm trying to find. In numeric order of the data frame.
My data Is in a dateandtime format. But I can change that.
Thanks for all your help I really appreciate it everything everybody does on this site.
Upvotes: 3
Views: 96
Reputation: 441
Convert list of integers into string.
Use findall() function of regex module to find all occurences of target_string
in my_list_string
.
import re
my_list = [3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3]
target = [3, 1, 3, 3, 1]
my_list_string = ''.join(str(e) for e in my_list)
target_string = ''.join(str(e) for e in target)
print(len(re.findall(target_string, my_list_string)))
Upvotes: 0
Reputation: 109546
This converts a list of numbers into a comma separated string, and then compares each sequential chunk to the target.
from itertools import izip_longest
my_list = [3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3]
target = [3, 1, 3, 3, 1]
n = len(target)
>>> sum(all(a == b for a, b in izip_longest(target, my_list[(i * n):((i + 1) * n)]))
for i in range(len(my_list) // n))
1
Below is an alternative method that converts the integers to strings and then compares the strings.
target = ",".join(str(number) for number in target)
>>> target
'3,1,3,3,1'
>>> sum(",".join(str(number) for number in my_list[(i * n):(i * n + n)]) == target
for i in range(len(my_list) / n))
1
To give some more intuition on what is going on, the list is chunked five elements at a time and then those elements are joined as a string. These strings are then compared to the target string which was similarly converted, and the number of matches are then summed.
>>> [",".join(str(number) for number in my_list[(i * n):(i * n + n)])
for i in range(len(my_list) / n)]
['3,1,3,3,1', '3,3,1,3,3', '1,3,1,1,1', '1,3,1,3,1']
Upvotes: 0
Reputation: 10759
my_list = np.array([3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3])
target = np.array([3, 1, 3, 3, 1])
(my_list.reshape(-1, len(sequence)) == sequence[None, :]).all(axis=1)
Upvotes: 2