Reputation:

Regex to exclude content between title tag

What's wrong with this regex to exclude content of title tag?

$plaintext = preg_match('#<title>(.*?)</title>#', $html);

$html has html code of entire page.

Upvotes: 3

Answers (3)

zx81

Reputation: 41848

It sounds like you never got a working answer. Let's remove the title tags.

Search: (?s)<title>.*?</title>

Replace: ""

Code:

$regex = "~(?s)<title>.*?</title>~";
$ replaced = preg_replace($regex,"",$pagecontent);

Explain Regex

(?s)                     # set flags for this block (with . matching
                         # \n) (case-sensitive) (with ^ and $
                         # matching normally) (matching whitespace
                         # and # normally)
<title>                  # '<title>'
.*?                      # any character (0 or more times (matching
                         # the least amount possible))
</title>                 # '</title>'

Upvotes: 5