Reputation: 51
I can use JEditorPane to parse the rtf text and convert it to html. But the html output is missing some format, namely the strike-through markups in this case. As you can see in the output, underline text was correctly wrapped within <u> but there is no strike-through wrapper. Any idea?
public void testRtfToHtml()
JEditorPane pane = new JEditorPane();
StyledEditorKit kitRtf = (StyledEditorKit) pane.getEditorKitForContentType("text/rtf");
new StringReader(
"{\\rtf1\\ansi \\deflang1033\\deff0{\\fonttbl {\\f0\\froman \\fcharset0 \\fprq2 Times New Roman;}}{\\colortbl;\\red0\\green0\\blue0;} {\\stylesheet{\\fs20 \\snext0 Normal;}} {\\plain \\fs26 \\strike\\fs26 This is supposed to be strike-through.}{\\plain \\fs26 \\fs26 } {\\plain \\fs26 \\ul\\fs26 Underline text here} {\\plain \\fs26 \\fs26 .{\\u698\\'20}}"),
pane.getDocument(), 0);
kitRtf = null;
StyledEditorKit kitHtml =
(StyledEditorKit) pane.getEditorKitForContentType("text/html");
Writer writer = new StringWriter();
kitHtml.write(writer, pane.getDocument(), 0, pane.getDocument().getLength());
catch (Exception e)
p.Normal {
<p class=default>
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
This is supposed to be strike-through.
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
<u>Underline text here</u>
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
Upvotes: 5
Views: 13038
Reputation: 11513
There's a pretty decent solution with the rtf-to-html open source library, which is still active (just released 1.1.0).
public String rtfToHtml(String rtfContent) {
return RTF2HTMLConverterRFCCompliant.INSTANCE.rtf2html(rtfContent);
For legacy purposes, RTF2HTMLConverterClassic
and RTF2HTMLConverterJEditorPane
are available as well
Upvotes: 0
Reputation: 1
Due to some bugs, I modify your function like this :
public static String rtfToHtml(String rtfText) {
StringBuilder sb = new StringBuilder();
if (rtfText != null) {
String[] lignes = rtfText.split("[\\r\\n]+");
for (String ligne : lignes) {
String tempLine = ligne
.replaceAll("\\{\\\\\\*\\\\[m]?htmltag[\\d]*([^}]*)\\}", "$1")
.replaceAll("\\\\htmlrtf0([^\\\\]*)\\\\htmlrtf", "$1")
.replaceAll("\\\\htmlrtf \\{(.*)\\}\\\\htmlrtf0", "$1")
.replaceAll("\\\\htmlrtf (.*)\\\\htmlrtf0", "")
.replaceAll("\\\\htmlrtf[0]?", "")
.replaceAll("\\\\field\\{\\\\\\*\\\\fldinst\\{[^}]*\\}\\}", "")
.replaceAll("\\{\\\\fldrslt\\\\cf1\\\\ul([^}]*)\\}", "$1")
.replaceAll("\\\\htmlbase", "")
.replaceAll("\\\\par", "\n")
.replaceAll("\\\\tab", "\t")
.replaceAll("\\\\line", "\n")
.replaceAll("\\\\page", "\n\n")
.replaceAll("\\\\sect", "\n\n")
.replaceAll("\\\\emdash", "ߞ")
.replaceAll("\\\\endash", "ߝ")
.replaceAll("\\\\emspace", "ߓ")
.replaceAll("\\\\enspace", "ߒ")
.replaceAll("\\\\qmspace", "ߕ")
.replaceAll("\\\\bullet", "ߦ")
.replaceAll("\\\\lquote", "ߢ")
.replaceAll("\\\\rquote", "ߣ")
.replaceAll("\\\\ldblquote", "ÉC;")
.replaceAll("\\\\rdblquote", "ÉD;")
.replaceAll("\\\\row", "\n")
.replaceAll("\\\\cell", "|")
.replaceAll("\\\\nestcell", "|")
.replaceAll("([^\\\\])\\{", "$1")
.replaceAll("([^\\\\])}", "$1")
.replaceAll("[\\\\](\\{)", "$1")
.replaceAll("[\\\\](})", "$1")
.replaceAll("\\\\u([0-9]{2,5})", "&#$1;")
.replaceAll("\\\\'([0-9A-Fa-f]{2})", "&#x$1;")
.replaceAll("\"cid:(.*)@.*\"", "\"$1\"")
.replaceAll(" {2,}", " ")
if (!tempLine.replaceAll("\\s+", "").isEmpty()) {
rtfText = sb.toString();
int index = rtfText.indexOf("<html");
if (index != -1) {
return rtfText.substring(index);
return null;
Upvotes: 0
Reputation: 21
Here is a function I'm using to convert RTF to HTML from a .msg body. See my Outlook message parser yamp repository on GitHub.
public static String rtfToHtml(String rtfText) {
if (rtfText != null) {
rtfText = rtfText.replaceAll("\\{\\\\\\*\\\\[m]?htmltag[\\d]*(.*)}", "$1")
.replaceAll("\\\\htmlrtf[1]?(.*)\\\\htmlrtf0", "")
.replaceAll("\\\\htmlrtf[01]?", "")
.replaceAll("\\\\htmlbase", "")
.replaceAll("\\\\par", "\n")
.replaceAll("\\\\tab", "\t")
.replaceAll("\\\\line", "\n")
.replaceAll("\\\\page", "\n\n")
.replaceAll("\\\\sect", "\n\n")
.replaceAll("\\\\emdash", "ߞ")
.replaceAll("\\\\endash", "ߝ")
.replaceAll("\\\\emspace", "ߓ")
.replaceAll("\\\\enspace", "ߒ")
.replaceAll("\\\\qmspace", "ߕ")
.replaceAll("\\\\bullet", "ߦ")
.replaceAll("\\\\lquote", "ߢ")
.replaceAll("\\\\rquote", "ߣ")
.replaceAll("\\\\ldblquote", "ÉC;")
.replaceAll("\\\\rdblquote", "ÉD;")
.replaceAll("\\\\row", "\n")
.replaceAll("\\\\cell", "|")
.replaceAll("\\\\nestcell", "|")
.replaceAll("([^\\\\])\\{", "$1")
.replaceAll("([^\\\\])}", "$1")
.replaceAll("[\\\\](\\{)", "$1")
.replaceAll("[\\\\](})", "$1")
.replaceAll("\\\\u([0-9]{2,5})", "&#$1;")
.replaceAll("\\\\'([0-9A-Fa-f]{2})", "&#x$1;")
.replaceAll("\"cid:(.*)@.*\"", "\"$1\"");
int index = rtfText.indexOf("<html");
if (index != -1) {
return rtfText.substring(index);
return null;
Upvotes: 0
Reputation: 5451
You could try converting with OpenOffice or LibreOffice using this converter library as described in this blog post
Upvotes: 3