Johannes
Johannes

Reputation: 135

Java Html Table to Plain text

We save incomming emails in database. We then save one version with all the html-tags removed. The problem with this is that if the mail includes a table like this:

Heading1 Heading2

column1 column2

it looks like this after removing tags

Heading1

Heading2

column1

column2

Is there a simple way to get a html table and turn it to plain text but with the formating still intact. At least with linebreaks in the right places

So the table turns into something like: Heading1 Heading2 \r\n column1 column2 \r\n. Or something similar.

Any ideas?

Upvotes: 1

Views: 1067

Answers (1)

Eric Galluzzo
Eric Galluzzo

Reputation: 3241

A simple way? Not really. HTML tables are complex, and can have row spans and column spans, not to mention normal HTML attributes like bidirectional text. CSS attributes like display: table-cell; can also cause otherwise ordinary HTML to suddenly become a table.

However, if you don't really care too much about formatting and just want to output multiple columns onto the same line, you could parse the HTML using something like JTidy or Jericho, then output multiple <td> or <th> tags by putting spaces between them, and when you get the end of a <tr> element, you could output "\r\n".

If you really don't want to parse the HTML, you could just replace <td> and <th> tags themselves with a single space or tab, and <tr> with a linebreak. This may get you at least some reasonable results.

Upvotes: 2

Related Questions