Reputation:
I want to reduce the number of patterns I have to write by using a regex that picks up any or all of the pattern when it appears in a string.
Is this possible with Regex?
E.g. Pattern is: "the cat sat on the mat"
I would like pattern to match on following strings:
"the"
"the cat"
"the cat sat"
...
"the cat sat on the mat"
But it should not match on the following string because although some words match, they are split by a non matching word: "the dog sat"
Upvotes: 1
Views: 3173
Reputation: 3038
Perhaps it would be easier and more logical to think about the problem a little differently..
Instead of matching the pattern against the string.... how about using the string as the pattern and looking for it in the pattern.
For example where
string = "the cat sat on" pattern = "the cat sat on the mat"
string is always a subset of pattern and is simply a case of doing a regex match.
If that makes sense ;-)
Upvotes: 0
Reputation: 338208
This:
the( cat( sat( on( the( mat)?)?)?)?)?
would answer your question. Remove "optional group" parens "(...)?" for parts that are not optional, add additional groups for things that must match together.
the // complete match
the cat // complete match
the cat sat // complete match
the cat sat on // complete match
the cat sat on the // complete match
the cat sat on the mat // complete match
the dog sat on the mat // two partial matches ("the")
You might want to add some pre-condition, like a start of line anchor, to prevent the expression from matching the second "the" in the last line:
^the( cat( sat( on( the( mat)?)?)?)?)?
EDIT: If you add a post-condition, like the end-of-line anchor, matching will be prevented entirely on the last example, that is, the last example won't match at all:
the( cat( sat( on( the( mat)?)?)?)?)?$
Credits for the tip go to VonC. Thanks!
The post-condition may of course be something else you expect to follow the match.
Alternatively, you remove the last question mark:
the( cat( sat( on( the( mat)?)?)?)?)
Be aware though: This would make a single "the" a non-match, so the first line will also not match.
Upvotes: 7
Reputation: 1324268
It could be fairly complicated:
(?ms)the(?=(\s+cat)|[\r\n]+)(:?\s+cat(?=(\s+sat)|[\r\n]+))?(:?\s+sat(?=(\s+on)|[\r\n]+))?(:?\s+on(?=(\s+the)|[\r\n]+))?(:?\s+the(?=(\s+mat)|[\r\n]+))?(:?\s+mat)?[\r\n]+
Meaning:
the
" only if followed by "cat
" or end of linecat
" (optional) only if followed by "sat
"It does match
the cat sat on the mat
the cat
the cat sat
the cat sat aa on the mat (nothing is match either)
the dog sat (nothing is matched there)
On second thought, Tomalak's answer is simpler (if fixed, that is ended with a '$').
I keep mine as a wiki post.
Upvotes: 2
Reputation: 16249
If you know the match always begins at the first character, it would be much faster to match the characters directly in a loop. I don't think Regex will do it anyway.
Upvotes: 1