Reputation: 14445
I’m trying to create a regex for form validation but it always returns true. The user must be able to add something like {user|2|S}
as input but also use brackets if they are escaped with \
.
This code checks for the left bracket {
for now.
$regex = '/({(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)}))|[^{]|(?<=\\\){)*/';
if (preg_match($regex, $value)) {
return TRUE;
} else {
return FALSE;
}
A possible correct input would be:
Hello {user|1|S}, you have {amount|2|D2}
or
Hello {user|1|S}, you have {amount|2|D2} in \{the_bracket_bank\}
However, this should return false:
Hello {user|1|S}, you have {amount|2}
and this also:
Hello {user|1|S}, you have {amount|2|D2} in {the_bracket_bank}
A live example can be found here: http://regexr.com?37tpu Note that there is a \
in the lookbehind at the end, PHP was giving me error messages because I had to escape it an extra time in my code.
Upvotes: 1
Views: 123
Reputation: 8661
Looks more of a job for a lookbehind to me:
/((?<!\\\\)\{[a-zA-Z0-9]+\|[0-9]+\|[SD][0-9]*\})/
However, the obfuscation factor is so high that I would rather recognize all bracketed strings and parse them later.
Upvotes: 0
Reputation: 46350
You can make a regex for this without using lookbehind/lookaheads (which is usually recommended).
For example, if your requirement is that you can match any character but a {
and a }
unless it's preceded by a \
. You can also say:
Match any character but a {
and a }
OR match a \{
or a \}
. To match any character but a {
and a }
use:
[^{}]
To match a \{
use:
\\\{
One backslash is for escaping the {
(which might not be necessary, depending on your regex compiler) and one backslash is for escaping the other backslash.
You would end up with this:
(?:
[^{}]
|
\\\{
|
\\\}
)+
I nicely formatted this regex so that it's readable. If you want to use it in your code like this make sure to use the [PCRE_EXTENDED][1]
modifier.
Upvotes: 1
Reputation: 13581
The main error is that you do not specify that the regex should match from the beginning to the of the checked string. Use the ^
and $
assertions.
I think you have to escape {
and }
in your regex as they have special meaning. Together they form a quantifier.
The (?<=\\\)
is better written (?<=\\\\)
. The backslash has to be double escaped as it has special meaning in both single-quoted string and PCRE regex. Using \\\
works too, because if single-quoted string contains any escape sequence except \\
and \'
, it handles it as literal backslash and letter, therefore \)
is taken literally. But explicitly escaping the backslash twice seems easier to read to me.
The regex should be
$regex = '/^(\{(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\}))|[^{]|(?<=\\\\)\{)*$/';
But notice that the look-around assertions are not necessary. This regex should do the job too:
$regex = '/^([^{]|\\\{|\{[a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\})*$/';
Any non-{
characters are matched by the first alternative. When a {
is read, one of the remaining two alternatives is used. Either the pattern for the brace thing matches, or the regex engine backtracks one character and tries to match \{
character sequence. If it fails, both ways, it backtracks further till it reaches string start and fails completely.
Upvotes: 1