camelCase
camelCase

Reputation: 532

A regular expression for finding a word within a string exactly "n" times

Im trying to make a regular expression in python that allows me to find a word within a string "n" times

For example, if i wanted to find a expression that could match if the word "cat" is exactly two times. How i would do that?

It should accept "The blue cat talks to the red cat in the tree". Because it has "cat" exactly two times.

But it should not accept "The cat is big". Because it has "cat" only once

And it should not accept either "the dog is yellow". For similar reasons

Thanks a lot

EDIT Hey guys

Sorry for complicating the problem too much, but i forgot to mention one thing.

If i wanted to find "cat" exactly two times, "The catcat runs" would also match

Upvotes: 1

Views: 1831

Answers (5)

Raymond Hettinger
Raymond Hettinger

Reputation: 226316

Just build a regex with multiple instance of 'cat' separated by a group that consumes other characters:

>>> import re
>>> n = 2
>>> regex = re.compile('.*'.join(['\bcat\b'] * n))
>>> regex.search('The cat is big')
>>> regex.search('The blue cat talks to the red cat in the tree')
<_sre.SRE_Match object at 0x17ca1a8>

Upvotes: 0

ridgerunner
ridgerunner

Reputation: 34395

If you wish to use a single regular expression to ensure a string contains exactly 2 instances of the word "cat", (no more, no less, and not "catastrophic" or "catcat"), then the following tested script will do the trick:

import re
text = r'The cat chased its cat toy, but failed to catch it.'
if re.match(r"""
    # Match string containing exactly n=2 "cat" words.
    ^                    # Anchor to start of string.
    (?:                  # Group for specific word count.
      (?:(?!\bcat\b).)*  # Zero or more non-"cat" chars,
      \bcat\b            # followed by the word "cat",
    ){2}                 # exactly n=2 times.
    (?:(?!\bcat\b).)*    # Zero or more non-"cat" chars.
    \Z                   # Anchor to end of string.
    """, text, re.DOTALL | re.VERBOSE):
    # Match attempt successful.
    print "Match found"
else:
    # Match attempt failed.
    print "No match found"

However, if you do wish to match the cat in "catastrophic" and "catcat", then remove all the \b word boundary anchors from the regex.

Upvotes: 0

Winston Ewert
Winston Ewert

Reputation: 45039

Don't use regular expressions just because they are there.

words = text.split()
print words.count('cat')

As Vincent points out, that assumes all words are seperated by whitespace.

words = re.findall("\b\w*") 

Is probably a better options. Although whether that is neccesary depends on details not provided in your post.

EDIT

If you don't even care about word boundaries, there is even less reason to be using a regular expression.

print text.count("cat")

Upvotes: 3

ᅠᅠᅠ
ᅠᅠᅠ

Reputation: 66970

How about this:

re.match(r'(.*\bcat\b){2}', 'The blue cat talks to the red cat in the tree')

The {2} means “repeat 2 times”. Use {7} for 7 repetitions. The \b is a word boundary; in this case the cat in "blue cat talks" would match, but "verification" wouldn't. And the .* will match any string.

You might want to go over the re documentation.

Upvotes: 2

Vincent Savard
Vincent Savard

Reputation: 35927

findall + len seems like one solution.

Upvotes: 2

Related Questions