curiousgeorge
curiousgeorge

Reputation: 361

PHP Regex dot matches new line alternative

I am come up with a regex to grab all text between 2 HTML tags. This is what I have so far:

<TAG[^>]*>(.*?)</TAG>

In practice, this should work perfectly. But executing it in PHP preg_replace with options: /ims results in the WHOLE string getting matched.

If I remove the /s tag, it works perfectly but the tags have newlines between them. Is there a better way on approaching this?

Upvotes: 1

Views: 995

Answers (2)

fvox
fvox

Reputation: 1087

I don't recommend use regex to match in full HTML, but, you can use the "dottal" flag: /REGEXP/s

Example:

$str = "<tag>
fvox
</tag>";

preg_match_all('/<TAG[^>]*>(.*?)</TAG>/is', $str, $r);
print_r($r); //dump

Upvotes: 1

user142162
user142162

Reputation:

Of course there's a better way. Don't parse HTML with regex.

DOMDocument should be able to accommodate you better:

$dom = new DOMDocument();
$dom->loadHTMLFile('filename.html');

$tags = $dom->getElementsByTagName('tag');

echo $tags[0]->textContent; // Contents of `tag`

You may have to tweak the above code (hasn't been tested).

Upvotes: 3

Related Questions