Avoid removal of spaces and newline while parsing HTML using jsoup

Question

I have a sample code as below.

String sample = "



This is a sample on              parsing HTML body using jsoup
This is a sample on              parsing HTML body using jsoup

";

Document doc = Jsoup.parse(sample);
String output = doc.body().text();

I get the output as

This is a sample on parsing HTML body using jsoup This is a sample on `parsing HTML body using jsoup`

But I want the output as

This is a sample on              parsing HTML body using jsoup
This is a sample on              parsing HTML body using jsoup

How do parse it so that I get this output? Or is there another way to do so in Java?

Benjamin P. · Accepted Answer

You can disable the pretty printing of your document to get the output like you want it. But you also have to change the .text() to .html().

Document doc = Jsoup.parse(sample);
doc.outputSettings(new Document.OutputSettings().prettyPrint(false));
String output = doc.body().html();

Avoid removal of spaces and newline while parsing HTML using jsoup

Answers (2)

Related Questions