Reputation: 135
We save incomming emails in database. We then save one version with all the html-tags removed. The problem with this is that if the mail includes a table like this:
Heading1 Heading2
column1 column2
it looks like this after removing tags
Heading1
Heading2
column1
column2
Is there a simple way to get a html table and turn it to plain text but with the formating still intact. At least with linebreaks in the right places
So the table turns into something like: Heading1 Heading2 \r\n column1 column2 \r\n. Or something similar.
Any ideas?
Upvotes: 1
Views: 1067
Reputation: 3241
A simple way? Not really. HTML tables are complex, and can have row spans and column spans, not to mention normal HTML attributes like bidirectional text. CSS attributes like display: table-cell;
can also cause otherwise ordinary HTML to suddenly become a table.
However, if you don't really care too much about formatting and just want to output multiple columns onto the same line, you could parse the HTML using something like JTidy or Jericho, then output multiple <td>
or <th>
tags by putting spaces between them, and when you get the end of a <tr>
element, you could output "\r\n"
.
If you really don't want to parse the HTML, you could just replace <td>
and <th>
tags themselves with a single space or tab, and <tr>
with a linebreak. This may get you at least some reasonable results.
Upvotes: 2