Reputation: 121
So I am trying to get a function working that will return a new list of single characters that immediately follow two other given characters. Like so:
def filter_possible_chars(corpus, last):
"""
>>> filter_possible_chars('lazy languid line', 'la')
['z', 'n']
>>> filter_possible_chars('pitter patter', 'tt')
['e', 'e']
"""
char_list = []
corpus_split = corpus.split()
for word in corpus_split:
if last in word:
word_split = word.split(last)
follows_last = word_split[1]
char_list.append(follows_last[0])
return char_list
This function works perfectly for the examples given in the docstring, however I need to include examples that include white space, e.g.:
>>> filter_possible_chars('when the goat jumped to the rock', ' t')
And it would return:
['h', 'o', 'h']
But since my function is obviously deleting the white space I think I need to try an entirely different method here. I thought about not splitting the string into individual words and trying to index into it with the given letters, but I cannot think of a way to make that work for more than one instance in a string.
Upvotes: 2
Views: 123
Reputation: 114018
>>> pat="tt"
>>> corpus="pitter patter"
>>> print(re.findall("%s(.)"%pat,corpus))
['e', 'e']
>>> corpus,pat = 'when the goat jumped to the rock', ' t'
>>> re.findall("%s(.)"%pat,corpus)
['h', 'o', 'h']
>>> corpus,pat = 'lazy languid line', 'la'
>>> re.findall("%s(.)"%pat,corpus)
['z', 'n']
%
is the string formatting operator, so for example "%s(.)" % "la"
evaluates to "la(.)"
.
In regular expressions, .
is the pattern for "any character", and ()
define groups whose values can be retrieved later, e.g. using findall
:
If one or more groups are present in the pattern, return a list of groups
So, for example, the pattern la(.)
means "search for la
followed by any character, and capture that character".
Upvotes: 4
Reputation: 5709
Your idea how to solve this issue is perfectly fine. Instead of splitting sentence into words you should try to find all instances of last
in full corpus
. But hey, actually split
function can do this for you.
corpus = 'when the goat jumped to the rock'
spl = corpus.split(' t')
print spl
>> ['when', 'he goat jumped', 'o', 'he rock']
res = [x[0] for x in spl[1:] if len(x) > 0]
print res
>> ['h', 'o', 'h']
So you can split corpus
by last
then get all strings from result of split without first one (as it does not start with last
) and then get first letter from each such string.
Upvotes: 3