Reputation:
I think I'm burnt out, and that's why I can't see an obvious mistake. Anyway, I want the following regex:
#BIZ[.\s]*#ENDBIZ
to grab me the #BIZ tag, #ENDBIZ tag and all the text in between the tags. For example, if given some text, I want the expression to match:
#BIZ
some text some test
more text
maybe some code
#ENDBIZ
At the moment, the regex matches nothing. What did I do wrong?
I'm doing the following in PHP
preg_replace('/#BIZ[.\s]*#ENDBIZ/', 'my new text', $strMultiplelines);
Upvotes: 3
Views: 3822
Reputation: 56572
The dot loses its special meaning inside a character class — in other words, [.\s]
means "match period or whitespace". I believe what you want is [\s\S]
, "match whitespace or non-whitespace".
preg_replace('/#BIZ[\s\S]*#ENDBIZ/', 'my new text', $strMultiplelines);
Edit: A bit about the dot and character classes:
By default, the dot does not match newlines. Most (all?) regex implementations have a way to specify that it match newlines as well, but it differs by implementation. The only way to match (really) any character in a compatible way is to pair a shorthand class with its negation — [\s\S]
, [\w\W]
, or [\d\D]
. In my personal experience, the first seems to be most common, probably because this is used when you need to match newlines, and including \s
makes it clear that you're doing so.
Also, the dot isn't the only special character which loses its meaning in character classes. In fact, the only characters which are special in character classes are ^
, -
, \
, and ]
. Check out the "Metacharacters Inside Character Classes" section of the character classes page on Regular-Expressions.info.
Upvotes: 13
Reputation: 151016
you can use
preg_replace('/#BIZ.*?#ENDBIZ/s', 'my new text', $strMultiplelines);
the 's' modifier says "match the dot with anything, even the newline character". the '?' says don't be greedy, such as for the case of:
foo
#BIZ
some text some test
more text
maybe some code
#ENDBIZ
bar
#BIZ
some text some test
more text
maybe some code
#ENDBIZ
hello world
the non-greediness won't get rid of the "bar" in the middle.
Upvotes: 1
Reputation: 881665
Depending on the environment you're using your regex in, it may need special care to properly parse multiline text, eg re.DOTALL in Python. So what environment is that?
Upvotes: 1
Reputation: 59563
Unless I am missing something, you handle this the same way that you would in Perl, with either the /m
or /s
modifier at the end? Oddly enough the other answers that rather correctly pointed this out got down voted?!
Upvotes: 0
Reputation: 59645
The mistake is the character group [.\s]
that will match a dot (not any character) or white space. You probably tried to get .*
with .
matching newline characters, too. You achieve this by enabling the single line option ((?s:)
does this in .NET regex).
(?s:#BIZ.*?#ENDBIZ)
Upvotes: 1
Reputation: 2033
This should work
#BIZ[\s\S]*#ENDBIZ
You can try this online Regular Expression Testing Tool
Upvotes: 2
Reputation: 4578
// Replaces all of your code with "my new text", but I do not think
// this is actually what you want based on your description.
preg_replace('/#BIZ(.+?)#ENDBIZ/s', 'my new text', $contents);
// Actually "gets" the text, which is what I think you might be looking for.
preg_match('/(#BIZ)(.+?)(#ENDBIZ)/s', $contents, $matches);
list($dummy, $startTag, $data, $endTag) = $matches;
Upvotes: 2
Reputation: 91545
It looks like you're doing a javascript regex, you'll need to enable multiline by specifying the m
flag at the end of the expression:
var re = /^deal$/mg
Upvotes: -1