jorge
jorge

Reputation:

Regex - Multiline Problem

I think I'm burnt out, and that's why I can't see an obvious mistake. Anyway, I want the following regex:

#BIZ[.\s]*#ENDBIZ

to grab me the #BIZ tag, #ENDBIZ tag and all the text in between the tags. For example, if given some text, I want the expression to match:

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

At the moment, the regex matches nothing. What did I do wrong?

ADDITIONAL DETAILS

I'm doing the following in PHP

preg_replace('/#BIZ[.\s]*#ENDBIZ/', 'my new text', $strMultiplelines);

Upvotes: 3

Views: 3822

Answers (8)

Ben Blank
Ben Blank

Reputation: 56572

The dot loses its special meaning inside a character class — in other words, [.\s] means "match period or whitespace". I believe what you want is [\s\S], "match whitespace or non-whitespace".

preg_replace('/#BIZ[\s\S]*#ENDBIZ/', 'my new text', $strMultiplelines);

Edit: A bit about the dot and character classes:

By default, the dot does not match newlines. Most (all?) regex implementations have a way to specify that it match newlines as well, but it differs by implementation. The only way to match (really) any character in a compatible way is to pair a shorthand class with its negation — [\s\S], [\w\W], or [\d\D]. In my personal experience, the first seems to be most common, probably because this is used when you need to match newlines, and including \s makes it clear that you're doing so.

Also, the dot isn't the only special character which loses its meaning in character classes. In fact, the only characters which are special in character classes are ^, -, \, and ]. Check out the "Metacharacters Inside Character Classes" section of the character classes page on Regular-Expressions.info.

Upvotes: 13

nonopolarity
nonopolarity

Reputation: 151016

you can use

preg_replace('/#BIZ.*?#ENDBIZ/s', 'my new text', $strMultiplelines);

the 's' modifier says "match the dot with anything, even the newline character". the '?' says don't be greedy, such as for the case of:

foo

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

bar

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

hello world

the non-greediness won't get rid of the "bar" in the middle.

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 881665

Depending on the environment you're using your regex in, it may need special care to properly parse multiline text, eg re.DOTALL in Python. So what environment is that?

Upvotes: 1

D.Shawley
D.Shawley

Reputation: 59563

Unless I am missing something, you handle this the same way that you would in Perl, with either the /m or /s modifier at the end? Oddly enough the other answers that rather correctly pointed this out got down voted?!

Upvotes: 0

Daniel Brückner
Daniel Brückner

Reputation: 59645

The mistake is the character group [.\s] that will match a dot (not any character) or white space. You probably tried to get .* with . matching newline characters, too. You achieve this by enabling the single line option ((?s:) does this in .NET regex).

(?s:#BIZ.*?#ENDBIZ)

Upvotes: 1

Robert Kozak
Robert Kozak

Reputation: 2033

This should work

#BIZ[\s\S]*#ENDBIZ

You can try this online Regular Expression Testing Tool

Upvotes: 2

Beau Simensen
Beau Simensen

Reputation: 4578

// Replaces all of your code with "my new text", but I do not think
// this is actually what you want based on your description.
preg_replace('/#BIZ(.+?)#ENDBIZ/s', 'my new text', $contents);

// Actually "gets" the text, which is what I think you might be looking for.
preg_match('/(#BIZ)(.+?)(#ENDBIZ)/s', $contents, $matches);
list($dummy, $startTag, $data, $endTag) = $matches;

Upvotes: 2

Soviut
Soviut

Reputation: 91545

It looks like you're doing a javascript regex, you'll need to enable multiline by specifying the m flag at the end of the expression:

var re = /^deal$/mg 

Upvotes: -1

Related Questions