Reputation: 458
I want to convert a HTML file with a table based layout to plaintext in order to send a multipart email via PHP.
I have tried a few different pre built classes / functions that I've found on SO, but none of them seem to produce decent results, which I believe is down to the table-based layout.
I don't want to roll my own class for stripping HTML and formatting the results as I am sure there are edge issues which I won't account for or be able to test until I come across them in production.
The best solution I've come up with so far is:
This works fine, but I'm a little worried that its not the optimal way of achieving a decent multipart email. Is anyone aware of a better way?
To clarify, I have already tried the following without success:
Upvotes: 2
Views: 2213
Reputation: 43265
PHP DomDocument should help you in this. You can traverse the DOM tree and strip out relevant content as you want.
http://php.net/manual/en/class.domdocument.php
Related question on SO :
Parse HTML with PHP's HTML DOMDocument
Upvotes: 1
Reputation: 1312
Lynx is not the best solution as I truly believe :) Also, I've used html2text myself and it works fine and is better than lynx.. anyway, if you prefer regexing it would rather be much more heavy than using the system shell (shell_exec, system, exec, popen), as you need to preg_replace all unnecessary tags, and in php regex is deadly slow. So I guess if it's on linux machine it's better to pass to html2text..
Upvotes: 2