Reputation: 4089
I'm converting some html to plain text, and I was using jsoup's HtmlToPlainText
. However, in recent jsoup releases, that code is no longer included because it is supposedly provided only as an example (although the HtmlToPlainText javadoc still says it's part of jsoup.jar).
Other than manually copying or packaging that code as an additional library, what else can I use instead? Is there an alternative included in jsoup or at least based on jsoup?
Upvotes: 6
Views: 1984
Reputation: 722
We recently switched from JSoup to Jericho
return new Source(html).getRenderer().setMaxLineLength(Integer.MAX_VALUE).setNewLine(null).toString();
With this maven dependency
<dependency>
<groupId>net.htmlparser.jericho</groupId>
<artifactId>jericho-html</artifactId>
<version>3.4</version>
</dependency>
Upvotes: 3
Reputation: 42585
The class HtmlToPlainText
is an example how to use the Jsoup library. If you want to use it you have to copy it's source code into your own project. All referenced classes are included in the Jsoup library, you just need this one class.
Afterwards you can use it this way:
Document doc = Jsoup.parse(html);
String text = new HtmlToPlaintext().getPlainText(doc.body());
Copying the code into your project has the advantage that you can modify the HtmlToPlainText
class and adapt it to your needs, e.g if links are shown with their url or not.
Upvotes: 2