Wiktor Kisielewski
Wiktor Kisielewski

Reputation: 437

Python find similar combinations of elements in a list

So I have a list that looks a bit like this:

my_list = [0,1,1,1,0,0,1,0,1,0,1,1,0,0,0,1,0,1,1,0,1 ... 0,1,0]

it contains thousands of 0's and 1's basically. Im looking for a way to find similar (repeating) combinations of elements in it (10 next elements to be specific). So (for example) if there is a :

... 0,1,1,1,0,0,1,1,0,1 ...

combination and it appears more than once I would like to know where it is in my list (index) and how many times it repeats.

I need to check all possible combinations here, that is 1024 possibilities...

Upvotes: 0

Views: 289

Answers (4)

Karl Anka
Karl Anka

Reputation: 2849

Treat the elements as bits that can be converted to integers. The solution below converts the input list to integers, find number of occurrence of each integer and what index they can be found on.

import collections

x = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1]
as_int = []

# given the input above there is no pattern longer than 6 that occure more than once...
pattern_length = 6

# convert input to a list of integers
# can this be done in a nicer way, like skipping the string-conversion?
for s in range(len(x) - pattern_length+1) :
    bitstring = ''.join([str(b) for b in x[s:s+pattern_length]])
    as_int.append(int(bitstring,2))

# create a dict with integer as key and occurence as value
count_dict = collections.Counter(as_int)

# empty dict to store index for each integer
index_dict = {}

# find index for each integer that occur more than once
for key in dict(count_dict):
    if count_dict[key] > 1:
        indexes = [i for i, x in enumerate(as_int) if x == key]
        index_dict[key] = indexes

#print as binary together with its index
for key, value in index_dict.items():
    print('{0:06b}'.format(key), 'appears', count_dict[key], 'times, on index:', value)

Output:

101011 appears 2 times, on index: [6, 18]
010110 appears 2 times, on index: [7, 14]

Upvotes: 1

h4z3
h4z3

Reputation: 5459

It looks like a homework problem, so I don't want to give the solution at once, just hints.


Don't look at it literally. It's 0s and 1s, so you can look at them like at binary numbers.

Some hints:

  • 1024 "patterns" become just numbers from 0 to 1023.
  • Checking for a pattern is making a number from those 10 digits.

Think how you would do that then.

More hints, more technical:

  • If you have one number pattern, e.g. from 0th to 9th element, you can get 1st to 10th pattern by taking 9-digit (from 1st index to 9th index) value (aka %512), "move" them left (*2) and add the 10th digit.
  • Make a dictionary or list of lists where key/index is the pattern number (0 to 1023) and list contains indexes of the start pattern.

I'll edit this answer later to provide an example solution but I gotta take a short break first.


Edit:

Customisable base and length with defaults for your case.

def find_patterns(my_list, base=2, pattern_size=10):
    modulo_value = base ** (pattern_size-1)
    results = [[] for _ in range(base ** pattern_size)]
    current_value = 0
    for index, elem in enumerate(a):
        if index < pattern_size:
            current_value = base*current_value + elem
        elif index == pattern_size:
            results[current_value].append(0)
        if index >= pattern_size:
            current_value = base*(current_value % modulo_value) + elem
            results[current_value].append(index+1-pattern_size)  #index of the first element in the pattern
    return results

Upvotes: 2

Dani Mesejo
Dani Mesejo

Reputation: 61900

IIUC, you could do:

my_list = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0]

w = 10
occurrences = {}
for i in range(len(my_list) - w + 1):
    key = tuple(my_list[i:i+w])
    occurrences.setdefault(key, []).append(i)

for pattern, indices in occurrences.items():
    print(pattern, indices)

Output

(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) [0]
(1, 1, 1, 0, 0, 1, 0, 1, 0, 1) [1]
(1, 1, 0, 0, 1, 0, 1, 0, 1, 1) [2]
(1, 0, 0, 1, 0, 1, 0, 1, 1, 0) [3]
(0, 0, 1, 0, 1, 0, 1, 1, 0, 0) [4]
(0, 1, 0, 1, 0, 1, 1, 0, 0, 0) [5]
(1, 0, 1, 0, 1, 1, 0, 0, 0, 1) [6]
(0, 1, 0, 1, 1, 0, 0, 0, 1, 0) [7]
(1, 0, 1, 1, 0, 0, 0, 1, 0, 1) [8]
(0, 1, 1, 0, 0, 0, 1, 0, 1, 1) [9]
(1, 1, 0, 0, 0, 1, 0, 1, 1, 0) [10]
(1, 0, 0, 0, 1, 0, 1, 1, 0, 1) [11]
(0, 0, 0, 1, 0, 1, 1, 0, 1, 0) [12]
(0, 0, 1, 0, 1, 1, 0, 1, 0, 1) [13]
(0, 1, 0, 1, 1, 0, 1, 0, 1, 0) [14]

Upvotes: 1

Christian Sloper
Christian Sloper

Reputation: 7510

Here is a solution using regex:

import random
from itertools import product
import re

testlist = [str(random.randint(0,1)) for i in range(1000)]

testlist_str = "".join(testlist)

for i in ["".join(seq) for seq in product("01", repeat=10)]:
    print(f'pattern {i} has {len(re.findall(i, testlist_str))} matches')

outputs:

pattern 0000000000 has 0 matches
pattern 0000000001 has 0 matches
pattern 0000000010 has 1 matches
pattern 0000000011 has 2 matches
pattern 0000000100 has 2 matches
pattern 0000000101 has 2 matches
....

Upvotes: 2

Related Questions