Mike Christensen
Mike Christensen

Reputation: 91726

Counting cats with regular expressions

So I want to match a string with the word "cat" in it a bunch of times, such as:

"cat cat cat cat cat"

or

"cat   cat cat  cat"

If there's anything else besides "cat" or whitespace, I don't want to match. So I can do:

^(cat\s*)+$

However, I want to find out how many cats appear in the string. One way to do this would be to count the number of groups, however the above regular expression will only give me a single group with the first cat, not a capture per cat. Is there a way to do this using regular expressions?

Upvotes: 2

Views: 3899

Answers (7)

yarden
yarden

Reputation: 1936

"cat   cat cat  cat".split.count{|w|
    break false unless w == 'cat'

    true
}

Upvotes: 0

sawa
sawa

Reputation: 168269

Note that Mike's original regexp as well as Tomalak, Marten, tagman's answer all give the wrong count when the string includes instances of 'cat' that are consecutive (unless you want to consider 'catcat' as two instances of the word 'cat'). The following does not meet this problem.

def count_if_match
  delimiters = strip.split('cat')
  delimiters.length+1 if delimiters.all?{|s| s =~ / +/}
end

' cat   cat cat  cat'.count_if_match # => 4
' catcat cat cat'.count_if_match # => nil

Upvotes: 3

the Tin Man
the Tin Man

Reputation: 160631

I don't see anyone mentioning what I consider the obvious answer, using String#scan:

str = "cat cat cat    catcat"
str.scan('cat').size #=> 5

If you just have to use a regex:

str.scan(/cat/).size #=> 5

If you want to only catch unique, not run-together, occurrences:

str.scan(/\bcat\b/).size #=> 3

EDIT:

@sawa points out that there is (considerable) room for misinterpretation of the OP's question. This covers cases where the OP didn't want a search to occur if something besides cat and " " was in the string.

str.scan('cat').size if str.gsub(/(?:cat| )+/, '').empty? #=> 5

The other variations in my previous section can still be applied.

And, since "whitespace" could mean more than a simple space, "\s" should also work fine.

Upvotes: 5

rubyprince
rubyprince

Reputation: 17803

A Ruby way without regex would be:

string = "cat   cat cat  cat"
def match_cat(string)
  cat_array = string.split
  count = cat_array.size
  cat_array.uniq == ["cat"] ? count : false
end
match_cat(string)
=> 4

Upvotes: 0

Marten Veldthuis
Marten Veldthuis

Reputation: 1910

It's actually the last cat you're capturing. That happens because of the greediness of + and the way capture groups work. I don't think it's possible to get more than one capture out of a group. The best thing you can do is probably:

str = "cat   cat cat  cat"

matchdata = str.match(/^((?:cat\s*)+)$/)
=> #<MatchData "cat   cat cat  cat" 1:"cat   cat cat  cat"> 

matchdata[0].split(/\s+/).size
=> 4

Upvotes: 0

Tomalak
Tomalak

Reputation: 338406

You want to do two different things - validate a string and count word occurrences. Usually you cannot do these two things in one step.

var str   = "cat cat cat cat cat";
var count = 0;

if ( /^(cat\s*)+$/.test(str) ) {
  count = str.match(/cat/g).length;
}

In .NET regex you have Group.Captures which lists all the occurrences where a group matched, not just the last one, like in other regex engines. Here you could do both validating and counting in one step.

Upvotes: 2

vittt
vittt

Reputation: 96

Consider translating whitespaces to newlines, then count the lines matching the regexp.

Upvotes: 0

Related Questions