Help With PHP and XPath

Question

I need help doing a few things with XPath in PHP.

With any given HTML, I need to:

Remove all tables and their contents
Remove everything after the first h1 tag
Keep only paragraphs (INCLUDING their inner HTML (links, lists, etc))

With regex, I got everything working perfectly. When I encountered nested tables, however, I decided that it is indeed foolish to parse HTML with regex.

Thanks so much!

Dimitre Novatchev · Accepted Answer

With any given HTML, I need to:

• Remove all tables and their contents

• Remove everything after the first h1 tag

• Keep only paragraphs (INCLUDING their inner HTML (links, lists, etc))

This can be done very easily with XSLT:

In case your element names are not in the XHtml namespace, simple delete any occurence of h: in the above code.

Help With PHP and XPath

Answers (2)

Related Questions