Reputation: 135
Write a function called getWords(sentence, letter)
that takes in a sentence and a single letter, and returns a list of the words that start or end with this letter, but not both, regardless of the letter case.
For example:
>>> s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> getWords(s, "t")
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']
My attempt:
regex = (r'[\w]*'+letter+r'[\w]*')
return (re.findall(regex,sentence,re.I))
My Output:
['The', 'TART', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'until', 'next']
Upvotes: 1
Views: 16711
Reputation: 37691
Doing this is much easy with the startswith()
and endswith()
method.
def getWords(s, letter):
return ([word for word in mystring.split() if (word.lower().startswith('t') or
word.lower().endswith('t')) and not
(word.lower().startswith('t') and word.lower().endswith('t'))])
mystring = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(mystring, 't'))
Output
['The', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']
Update (using regular expression)
import re
result1 = re.findall(r'\b[t]\w+|\w+[t]\b', mystring, re.I)
result2 = re.findall(r'\b[t]\w+[t]\b', mystring, re.I)
print([x for x in result1 if x not in result2])
Explanation
Regular expression \b[t]\w+
and \w+[t]\b
finds words that start and ends with letter t
and \b[t]\w+[t]\b
finds words that both starts and ends with letter t
.
After generating two lists of words, just take the intersection of those two lists.
Upvotes: 3
Reputation: 177546
\b
detects word breaks. Verbose mode allows multi-line regexs and comments. Note that [^\W]
is the same as \w
, but to match \w
except a certain letter, you need [^\W{letter}]
.
import re
def getWords(s,t):
pattern = r'''(?ix) # ignore case, verbose mode
\b{letter} # start with letter
\w* # zero or more additional word characters
[^{letter}\W]\b # ends with a word character that isn't letter
| # OR
\b[^{letter}\W] # does not start with a non-word character or letter
\w* # zero or more additional word characters
{letter}\b # ends with letter
'''.format(letter=t)
return re.findall(pattern,s)
s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(s,'t'))
Output:
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']
Upvotes: 5
Reputation: 49320
Why are you using regex for this? Just check the first and last character.
def getWords(s, letter):
words = s.split()
return [a for a,b in ((word, set(word.lower()[::len(word)-1])) for word in words) if letter in b and len(b)==2]
Upvotes: 1
Reputation: 350137
It you want the regex for this, then use:
regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
The replace
is done to avoid the repeated verbose +letter+
.
So the code looks like this then:
import re
def getWords(sentence, letter):
regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
return re.findall(regex, sentence, re.I)
s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
result = getWords(s, "t")
print(result)
Output:
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']
I have used #
as a placeholder for the actual letter, and that will get replaced in the regular expression before it is actually used.
\b
: word break\w*
: 0 or more letters (or underscores)[^#\W]
: a letter that is not #
(the given letter)|
: logical OR. The left side matches words that start with the letter, but don't end with it, and the right side matches the opposite case.Upvotes: 2
Reputation: 5660
You can try the builtin startswith
and endswith
functions.
>>> string = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> [i for i in string.split() if i.lower().startswith('t') or i.lower().endswith('t')]
['The', 'TART', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']
Upvotes: 0