Reputation: 532
Im trying to make a regular expression in python that allows me to find a word within a string "n" times
For example, if i wanted to find a expression that could match if the word "cat" is exactly two times. How i would do that?
It should accept "The blue cat talks to the red cat in the tree". Because it has "cat" exactly two times.
But it should not accept "The cat is big". Because it has "cat" only once
And it should not accept either "the dog is yellow". For similar reasons
Thanks a lot
EDIT Hey guys
Sorry for complicating the problem too much, but i forgot to mention one thing.
If i wanted to find "cat" exactly two times, "The catcat runs" would also match
Upvotes: 1
Views: 1831
Reputation: 226316
Just build a regex with multiple instance of 'cat' separated by a group that consumes other characters:
>>> import re
>>> n = 2
>>> regex = re.compile('.*'.join(['\bcat\b'] * n))
>>> regex.search('The cat is big')
>>> regex.search('The blue cat talks to the red cat in the tree')
<_sre.SRE_Match object at 0x17ca1a8>
Upvotes: 0
Reputation: 34395
If you wish to use a single regular expression to ensure a string contains exactly 2 instances of the word "cat", (no more, no less, and not "catastrophic" or "catcat"), then the following tested script will do the trick:
import re
text = r'The cat chased its cat toy, but failed to catch it.'
if re.match(r"""
# Match string containing exactly n=2 "cat" words.
^ # Anchor to start of string.
(?: # Group for specific word count.
(?:(?!\bcat\b).)* # Zero or more non-"cat" chars,
\bcat\b # followed by the word "cat",
){2} # exactly n=2 times.
(?:(?!\bcat\b).)* # Zero or more non-"cat" chars.
\Z # Anchor to end of string.
""", text, re.DOTALL | re.VERBOSE):
# Match attempt successful.
print "Match found"
else:
# Match attempt failed.
print "No match found"
However, if you do wish to match the cat in "catastrophic" and "catcat", then remove all the \b
word boundary anchors from the regex.
Upvotes: 0
Reputation: 45039
Don't use regular expressions just because they are there.
words = text.split()
print words.count('cat')
As Vincent points out, that assumes all words are seperated by whitespace.
words = re.findall("\b\w*")
Is probably a better options. Although whether that is neccesary depends on details not provided in your post.
EDIT
If you don't even care about word boundaries, there is even less reason to be using a regular expression.
print text.count("cat")
Upvotes: 3
Reputation: 66970
How about this:
re.match(r'(.*\bcat\b){2}', 'The blue cat talks to the red cat in the tree')
The {2}
means “repeat 2 times”. Use {7}
for 7 repetitions.
The \b
is a word boundary; in this case the cat in "blue cat talks" would match, but "verification" wouldn't. And the .*
will match any string.
You might want to go over the re
documentation.
Upvotes: 2