Reputation: 819
There are lots of answers to this question, but not a single complete one:
With using one regular expression, how do you extract page title from <title>Page title</title>
?
There are several other cases how title tags are typed, such as:
<TITLE>Page title</TITLE>
<title>
Page title</title>
<title>
Page title
</title>
<title lang="en-US">Page title</title>
...or any combination of above.
And it can be on its own line or in between other tags:
<head>
<title>Page title</title>
</head>
<head><title>Page title</title></head>
Thanks for help in advance.
UDPATE: So, the regex approach might not be the best solution to this. Which PHP based HTML parser could handle all scenarios, where HTML is well formed (or not so well)?
UPDATE 2: sp00m's regex (https://stackoverflow.com/a/13510307/1844607) seems to be working in all cases. I'll get back to this if needed.
Upvotes: 6
Views: 8988
Reputation: 48837
Use a HTML parser instead. But in case of:
<title[^>]*>(.*?)</title>
Upvotes: 12
Reputation:
Use the DOMDocument class:
$doc = new DOMDocument();
$doc->loadHTML($html);
$titles = $doc->getElementsByTagName("title");
echo $titles->item[0]->nodeValue;
Upvotes: 2