DewinDell
DewinDell

Reputation: 1028

Regular Expression to match pattern once or more with no partial matches

Better explained with examples:

  1. HHH
  2. HHHH
  3. HHHBBHHH
  4. HHHBH
  5. BB
  6. HHBH

I need to come up with a regexp that matches only 3 H's or a multiple of 3 H's (so 6, 9, 12, ... H's are ok as well) and 5 H's are not ok. And if possible I don't want to use Perl regexps.

So for the input above the regexp would match (1), (3) and (6) only. I'm just starting with regular expressions here so I don't exactly know how I'm supposed to approach this.

edit Just to clear something up:, an H can only be in one group of 3 H's. The group of 3 H's might be HHH or HHBH. That's why in example 2 above it is not a match because the last H is not in a group of 3 H's. And you can't take the last 3 H's in a group because the middle 2 H's have already been inside a group before.

Upvotes: 3

Views: 2383

Answers (3)

Jonathan Leffler
Jonathan Leffler

Reputation: 753655

Given the requirement that H's can be arbitrarily interleaved with non-H's, but that the total number of H's must be a non-zero multiple of 3 (so XXX, containing no H's, is not a match), then the total regular expression is anything but trivial. This is not a beginner's regular expression.

I'm going to assume that the dialect of regular expression treats {} and () as metacharacters for counting and grouping, and includes + for one-or-more. If you're using a regular expression system that has a different requirement (\{\}, for example) then adjust accordingly.

You need the regex to match the whole string, so there are no stray H's allowed. So, it must start with ^ and end with $. You need to allow an arbitrary number of non-H's at front and back. The H's may be separated by an arbitrary number of non-H's. That leads to:

^([^H]*H[^H]*H[^H]*H)+[^H]*$

Ouch; that is hard to read! It says the line must consist of 1 or more (+) groups of an arbitrary number of non-H's followed by an H, an arbitrary number of non-H's, another H, an arbitrary number of non-H's and a third H; all of which can be followed by an arbitrary number of non-H's.

Using the {} for counting:

^(([^H]*H){3})+[^H]*$

That's still hard to read. Note that my description said "arbitrary number of non-H's at front and back", but I only use the [^H]* at the back; that's because the repeating pattern allows an arbitrary number of non-H's at the front anyway so there's no need to repeat that fragment.

Upvotes: 2

Howard
Howard

Reputation: 39197

You can use the following regular expression:

^([^H]*H[^H]*H[^H]*H[^H]*)+$

It matches any string which contains in total 3 H or any multiple of 3. In between there might be any other character.

Explanation:

^              begin of string
(              start of group
  [^H]*H       any string of characters (or none) not including 'H' plus a single 'H'
  [^H]*H       any string of characters (or none) not including 'H' plus a single 'H'
  [^H]*H       any string of characters (or none) not including 'H' plus a single 'H'
  [^H]*        any string of characters (or none) which is not 'H'
)+             containing the group once or twice or ...
$              end of string

By repeating the subpattern [^H]*H three times we make sure that there are indeed 3 H included, [^H]* allows any separating characters.

Note: use either egrep or run grep with additional argument -E.

Upvotes: 3

Kendall Frey
Kendall Frey

Reputation: 44316

Use this to match a multiple of 3 H's:

(H{3})+

Here is a complete regex for your examples:

^(H{3})+B*(H{3})*$

Edit: It looks like you need to count non-consecutive H's. In that case:

^(([^H]*H){3})+[^H]*$

That should match any string with a multiple of 3 H's.

Upvotes: 2

Related Questions