Michael Sazonov
Michael Sazonov

Reputation: 1533

RegExp confusing behavior

I am trying to match next expression in PHP:

preg_match( '#<head>([\s\S]*)</head>#' , $this -> $html_file_content , $match );

with no success.

$this -> $html_file_content contains valid HTML code and the <head> block is not empty.

This regexp works just great in JavaScript.

I've already tried next variations:

#<head>([.\s\S]*)</head>#
#<head>([.\s\S]+)</head>#
#<head>([\s\S]*)<\/head>#
#<head>([\s\S]+)<\/head>#

All of them worked just fine in JavaScript. I just cannot figure out, why there is no match in PHP.

Could you, guys, help me handle it?

Upvotes: 0

Views: 48

Answers (2)

Explosion Pills
Explosion Pills

Reputation: 191749

I can't reproduce this, so my guess is that $this -> $html_file_content should be

$this -> html_file_content

In case that doesn't work, maybe try it with SimpleXML assuming that the contents are well-formed.

$html = new SimpleXMLElement($html_file_content);
$head = "" . $html->head;

The common answer of "you should not use regex to parse HTML" is misguided. As far as I can tell you're not trying to parse the HTML; you just want to extract a string value from known delimiters. No one would have said anything if the delimiters were [head]...[/head]

Upvotes: 1

Rickert
Rickert

Reputation: 1675

you say that there must be only on ore more space elemets but you need to have any text including spaces try this (.*?) i didn't try it but i hope this is the right

/(.*?)< /head>/

Upvotes: 0

Related Questions