user1788542
user1788542

Reputation:

Regex to exclude content between title tag

What's wrong with this regex to exclude content of title tag?

$plaintext = preg_match('#<title>(.*?)</title>#', $html);

$html has html code of entire page.

Upvotes: 3

Views: 1425

Answers (3)

zx81
zx81

Reputation: 41838

It sounds like you never got a working answer. Let's remove the title tags.

Search: (?s)<title>.*?</title>

Replace: ""

Code:

$regex = "~(?s)<title>.*?</title>~";
$ replaced = preg_replace($regex,"",$pagecontent);

Explain Regex

(?s)                     # set flags for this block (with . matching
                         # \n) (case-sensitive) (with ^ and $
                         # matching normally) (matching whitespace
                         # and # normally)
<title>                  # '<title>'
.*?                      # any character (0 or more times (matching
                         # the least amount possible))
</title>                 # '</title>'

Upvotes: 5

Techsin
Techsin

Reputation: 532

I suppose it should be like this instead...This only gives you content in between

preg_match('(?<=<title>).*(?=<\/title>)', $html);

http://www.phpliveregex.com/p/1SJ

http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Upvotes: 0

bassxzero
bassxzero

Reputation: 5041

This will get everything between the two tags

preg_match('<title>.+', $html);

Upvotes: 0

Related Questions