Reputation:
What's wrong with this regex to exclude content of title tag?
$plaintext = preg_match('#<title>(.*?)</title>#', $html);
$html has html code of entire page.
Upvotes: 3
Views: 1425
Reputation: 41838
It sounds like you never got a working answer. Let's remove the title tags.
Search: (?s)<title>.*?</title>
Replace: ""
Code:
$regex = "~(?s)<title>.*?</title>~";
$ replaced = preg_replace($regex,"",$pagecontent);
Explain Regex
(?s) # set flags for this block (with . matching
# \n) (case-sensitive) (with ^ and $
# matching normally) (matching whitespace
# and # normally)
<title> # '<title>'
.*? # any character (0 or more times (matching
# the least amount possible))
</title> # '</title>'
Upvotes: 5
Reputation: 532
I suppose it should be like this instead...This only gives you content in between
preg_match('(?<=<title>).*(?=<\/title>)', $html);
http://www.phpliveregex.com/p/1SJ
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
Upvotes: 0
Reputation: 5041
This will get everything between the two tags
preg_match('<title>.+', $html);
Upvotes: 0