Robert Hemingway
Robert Hemingway

Reputation: 121

Finding characters in a string based on white spaces

So I am trying to get a function working that will return a new list of single characters that immediately follow two other given characters. Like so:

def filter_possible_chars(corpus, last):
    """
    >>> filter_possible_chars('lazy languid line', 'la')
    ['z', 'n']
    >>> filter_possible_chars('pitter patter', 'tt')
    ['e', 'e']
    """
    char_list = []
    corpus_split = corpus.split()
    for word in corpus_split:
        if last in word:
            word_split = word.split(last)
            follows_last = word_split[1]
            char_list.append(follows_last[0])
    return char_list

This function works perfectly for the examples given in the docstring, however I need to include examples that include white space, e.g.:

>>> filter_possible_chars('when the goat jumped to the rock', ' t')

And it would return:

['h', 'o', 'h']

But since my function is obviously deleting the white space I think I need to try an entirely different method here. I thought about not splitting the string into individual words and trying to index into it with the given letters, but I cannot think of a way to make that work for more than one instance in a string.

Upvotes: 2

Views: 123

Answers (2)

Joran Beasley
Joran Beasley

Reputation: 114018

>>> pat="tt"
>>> corpus="pitter patter"
>>> print(re.findall("%s(.)"%pat,corpus))
['e', 'e']
>>> corpus,pat = 'when the goat jumped to the rock', ' t'
>>> re.findall("%s(.)"%pat,corpus)
['h', 'o', 'h']
>>> corpus,pat = 'lazy languid line', 'la'
>>> re.findall("%s(.)"%pat,corpus)
['z', 'n']

Explanation

  • % is the string formatting operator, so for example "%s(.)" % "la" evaluates to "la(.)".

  • In regular expressions, . is the pattern for "any character", and () define groups whose values can be retrieved later, e.g. using findall:

    If one or more groups are present in the pattern, return a list of groups

So, for example, the pattern la(.) means "search for la followed by any character, and capture that character".

Upvotes: 4

running.t
running.t

Reputation: 5709

Your idea how to solve this issue is perfectly fine. Instead of splitting sentence into words you should try to find all instances of last in full corpus. But hey, actually split function can do this for you.

corpus = 'when the goat jumped to the rock'
spl = corpus.split(' t')
print spl
>> ['when', 'he goat jumped', 'o', 'he rock']
res = [x[0] for x in spl[1:] if len(x) > 0]
print res
>> ['h', 'o', 'h']

So you can split corpus by last then get all strings from result of split without first one (as it does not start with last) and then get first letter from each such string.

Upvotes: 3

Related Questions