mrblack
mrblack

Reputation:

regex: Matching parts of a string when the string contains part of a regex pattern

I want to reduce the number of patterns I have to write by using a regex that picks up any or all of the pattern when it appears in a string.

Is this possible with Regex?

E.g. Pattern is: "the cat sat on the mat"

I would like pattern to match on following strings:
"the"
"the cat"
"the cat sat"
...
"the cat sat on the mat"

But it should not match on the following string because although some words match, they are split by a non matching word: "the dog sat"

Upvotes: 1

Views: 3173

Answers (4)

DEzra
DEzra

Reputation: 3038

Perhaps it would be easier and more logical to think about the problem a little differently..

Instead of matching the pattern against the string.... how about using the string as the pattern and looking for it in the pattern.

For example where

string = "the cat sat on" pattern = "the cat sat on the mat"

string is always a subset of pattern and is simply a case of doing a regex match.

If that makes sense ;-)

Upvotes: 0

Tomalak
Tomalak

Reputation: 338208

This:

the( cat( sat( on( the( mat)?)?)?)?)?

would answer your question. Remove "optional group" parens "(...)?" for parts that are not optional, add additional groups for things that must match together.

the                       // complete match
the cat                   // complete match
the cat sat               // complete match
the cat sat on            // complete match
the cat sat on the        // complete match
the cat sat on the mat    // complete match
the dog sat on the mat    // two partial matches ("the")

You might want to add some pre-condition, like a start of line anchor, to prevent the expression from matching the second "the" in the last line:

^the( cat( sat( on( the( mat)?)?)?)?)?

EDIT: If you add a post-condition, like the end-of-line anchor, matching will be prevented entirely on the last example, that is, the last example won't match at all:

the( cat( sat( on( the( mat)?)?)?)?)?$

Credits for the tip go to VonC. Thanks!

The post-condition may of course be something else you expect to follow the match.

Alternatively, you remove the last question mark:

the( cat( sat( on( the( mat)?)?)?)?)

Be aware though: This would make a single "the" a non-match, so the first line will also not match.

Upvotes: 7

VonC
VonC

Reputation: 1324268

It could be fairly complicated:

(?ms)the(?=(\s+cat)|[\r\n]+)(:?\s+cat(?=(\s+sat)|[\r\n]+))?(:?\s+sat(?=(\s+on)|[\r\n]+))?(:?\s+on(?=(\s+the)|[\r\n]+))?(:?\s+the(?=(\s+mat)|[\r\n]+))?(:?\s+mat)?[\r\n]+

Meaning:

  • I want "the" only if followed by "cat" or end of line
  • then I want "cat" (optional) only if followed by "sat"
  • and so one
  • followed by and end of line (which ensure to not match partial "the cat walk...")

It does match

the cat sat on the mat
the cat
the cat sat
the cat sat aa on the mat (nothing is match either)
the dog sat (nothing is matched there)


On second thought, Tomalak's answer is simpler (if fixed, that is ended with a '$').
I keep mine as a wiki post.

Upvotes: 2

Ray Hidayat
Ray Hidayat

Reputation: 16249

If you know the match always begins at the first character, it would be much faster to match the characters directly in a loop. I don't think Regex will do it anyway.

Upvotes: 1

Related Questions