JSoup - preserve html entities when outputting as utf-8?

Question

I want to preserve html entities while using JSoup. Here is an utf-8 test string from a website:

String html = "hello — world";

String parsed = Jsoup.parse(html).toString();

If printing the parsed output in utf-8, it looks like the sequence — gets transformed into a character with a code point value of 151.

Is there a way to have JSoup preserve the original entity when outputting as utf-8? If I output in ascii encoding:

Document.OutputSettings settings = new Document.OutputSettings();
settings.charset(Charset.forName("ascii"));
Jsoup.parse(html).outputSettings(settings).toString();

I'll get:

hello — world

which is what I'm looking for.

JSoup - preserve html entities when outputting as utf-8?

Answers (1)

Option 1

Option 2

Option 3

Related Questions