Maurizio Cirilli
Maurizio Cirilli

Reputation: 103

Python - how to recursively search a variable substring in texts that are elements of a list

let me explain better what I mean in the title.
Examples of strings where to search (i.e. strings of variable lengths each one is an element of a list; very large in reality):

STRINGS = ['sftrkpilotndkpilotllptptpyrh', 'ffftapilotdfmmmbtyrtdll', 'gftttepncvjspwqbbqbthpilotou', 'htfrpilotrtubbbfelnxcdcz']

The substring to find, which I know is for sure:

SOURCE = ['gfrtewwxadasvpbepilotzxxndffc']

I am trying to write a Python3 program that finds this hidden word of 5 characters that is in SOURCE and at what position(s) it occurs in each element of STRINGS.

I am also trying to store the results in an array or a dictionary (I do not know what is more convenient at the moment).

Moreover, I need to perform other searches of the same type but with different LENGTH values, so this value should be provided by a variable in order to be of more general use.

I know that the first point has been already solved in previous posts, but never (as far as I know) together with the second point, which is the part of the code I could not be able to deal with successfully (I do not post my code because I know it is just too far from being fixable).

Any help from this great community is highly appreciated.

-- Maurizio

Upvotes: 3

Views: 129

Answers (1)

a_guest
a_guest

Reputation: 36319

You can iterate over the source string and for each sub-string use the re module to find the positions within each of the other strings. Then if at least one occurrence was found for each of the strings, yield the result:

import re

def find(source, strings, length):
    for i in range(len(source) - length):
        sub = source[i:i+length]
        positions = {}
        for s in strings:
            # positions[s] = [m.start() for m in re.finditer(re.escape(sub), s)]
            positions[s] = [i for i in range(len(s)) if s.startswith(sub, i)]  # Using built-in functions.
            if not positions[s]:
                break
        else:
            yield sub, positions

And the generator can be used as illustrated in the following example:

import pprint

pprint.pprint(dict(find(
    source='gfrtewwxadasvpbepilotzxxndffc',
    strings=['sftrkpilotndkpilotllptptpyrh',
             'ffftapilotdfmmmbtyrtdll',
             'gftttepncvjspwqbbqbthpilotou',
             'htfrpilotrtubbbfelnxcdcz'],
    length=5
)))

which produces the following output:

{'pilot': {'ffftapilotdfmmmbtyrtdll': [5],
           'gftttepncvjspwqbbqbthpilotou': [21],
           'htfrpilotrtubbbfelnxcdcz': [4],
           'sftrkpilotndkpilotllptptpyrh': [5, 13]}}

Upvotes: 3

Related Questions