Jari
Jari

Reputation: 819

Regular expression to get page title

There are lots of answers to this question, but not a single complete one:

With using one regular expression, how do you extract page title from <title>Page title</title>?

There are several other cases how title tags are typed, such as:

<TITLE>Page title</TITLE>

<title>
 Page title</title>
<title>
 Page title
</title>

<title lang="en-US">Page title</title>

...or any combination of above.

And it can be on its own line or in between other tags:

<head>
  <title>Page title</title>
</head>

<head><title>Page title</title></head>

Thanks for help in advance.

UDPATE: So, the regex approach might not be the best solution to this. Which PHP based HTML parser could handle all scenarios, where HTML is well formed (or not so well)?

UPDATE 2: sp00m's regex (https://stackoverflow.com/a/13510307/1844607) seems to be working in all cases. I'll get back to this if needed.

Upvotes: 6

Views: 8988

Answers (3)

F11
F11

Reputation: 3816

Use this regex:

<title>[\s\S]*?</title>

Upvotes: 0

sp00m
sp00m

Reputation: 48837

Use a HTML parser instead. But in case of:

<title[^>]*>(.*?)</title>

Demo

Upvotes: 12

user1726343
user1726343

Reputation:

Use the DOMDocument class:

$doc = new DOMDocument();
$doc->loadHTML($html);
$titles = $doc->getElementsByTagName("title");
echo $titles->item[0]->nodeValue;

Upvotes: 2

Related Questions