Specify utf-8 character encoding in RTF? The text (in UTF-8) format is correctly shown in Sqlite

Question

How can I set the character encoding in RTF of characters that are in the UTF-8 character encoding format?

I studied similar questions, but did not fiund a good solution. So, I hope you can help.

The content is in a Sqlite database. The text in a Slqite database can only be formatted using UTF-8, UTF-16 or similar. So that's why I have to stick to UTF-8.

The e" is shown correctly using a Sqlite database browser.

The required target program, which can only read RTF, displays the characters in a strange way.

I tried for example:

{\rtf1\ansi\ansicpg0\uc0...
{\rtf1\ansi\ansicpg1252\uc0...
{\rtf1\ansi\ansicpg65001\uc0...

An option is by mapping the special characters to their RTF-char equivalences, as shown in this table.

tm1701 · Accepted Answer

I read in many places that RTF doesn't have a UTF-8 standard solution.

So, I created my own converter after scanning half the internet. If you have a standard/better solution, please let me know!

So after studying this book and I created a converter based on these character mappings. Great resources.

This solved my question. Re-using other solutions is what I would like to do for this kind of features, but I was not able to find one, alas.

The converter could be something like:

public static String convertHtmlToRtf(String html) {
    String tmp = html.replaceAll("\R", " ")
            .replaceAll("\\", "\\\\")
            .replaceAll("\{", "\\{")
            .replaceAll("}", "\\}");
    tmp = tmp.replaceAll("([^<]+?)",
            "{\\field{\\*\\fldinst HYPERLINK \"$1\"}{\\fldrslt \\plain \\f2\\b\\fs20\\cf2 $2}}");
    tmp = tmp.replaceAll("([^<]+?)",
            "{\\field{\\*\\fldinst HYPERLINK \"$1\"}{\\fldrslt \\plain \\f2\\b\\fs20\\cf2 $2}}");

    tmp = tmp.replaceAll("", "\\line{\\b\\fs30{");
    tmp = tmp.replaceAll("", "}}\\line\\line ");
    tmp = tmp.replaceAll("", "{\\b{");
    tmp = tmp.replaceAll("", "}}");
    tmp = tmp.replaceAll("", "{\\b{");
    tmp = tmp.replaceAll("", "}}");
    tmp = tmp.replaceAll("", "{\\i{");
    tmp = tmp.replaceAll("", "}}");
    tmp = tmp.replaceAll("&", "&");
    tmp = tmp.replaceAll(""", "\"");
    tmp = tmp.replaceAll("©", "{\\'a9}");
    tmp = tmp.replaceAll("<", "<");
    tmp = tmp.replaceAll(">", ">");
    tmp = tmp.replaceAll("

", "{\\pard \\par}\\line ");
    tmp = tmp.replaceAll("
", "\\line ");
    tmp = tmp.replaceAll("
", "\\line ");
    tmp = tmp.replaceAll("]*?>", "{\\pard ");
    tmp = tmp.replaceAll("", " \\par}\\line ");
    tmp = convertSpecialCharsToRtfCodes(tmp);
    return "{\rtf1\ansi\ansicpg0\uc0\deff0\deflang0\deflangfe0\fs20{\fonttbl{\f0\fnil Tahoma;}{\f1\fnil Tahoma;}{\f2\fnil\fcharset0 Tahoma;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue0;\red255\green0\blue0;}" + tmp + "}";
}

 private static String convertSpecialCharsToRtfCodes(String input) {
    char[] chars = input.toCharArray();
    StringBuffer sb = new StringBuffer();
    int length = chars.length;
    for (int i = 0; i < length; i++) {
        switch (chars[i]) {
            case '’':
                sb.append("{\'92}");
                break;
            case '`':
                sb.append("{\'60}");
                break;
            case '€':
                sb.append("{\'80}");
                break;
            case '…':
                sb.append("{\'85}");
                break;
            case '‘':
                sb.append("{\'91}");
                break;
            case '̕':
                sb.append("{\'92}");
                break;
            case '“':
                sb.append("{\'93}");
                break;
            case '”':
                sb.append("{\'94}");
                break;
            case '•':
                sb.append("{\'95}");
                break;
            case '–':
            case '‒':
                sb.append("{\'96}");
                break;
            case '—':
                sb.append("{\'97}");
                break;
            case '©':
                sb.append("{\'a9}");
                break;
            case '«':
                sb.append("{\'ab}");
                break;
            case '±':
                sb.append("{\'b1}");
                break;
            case '„':
                sb.append("\"");
                break;
            case '´':
                sb.append("{\'b4}");
                break;
            case '¸':
                sb.append("{\'b8}");
                break;
            case '»':
                sb.append("{\'bb}");
                break;
            case '½':
                sb.append("{\'bd}");
                break;
            case 'Ä':
                sb.append("{\'c4}");
                break;
            case 'È':
                sb.append("{\'c8}");
                break;
            case 'É':
                sb.append("{\'c9}");
                break;
            case 'Ë':
                sb.append("{\'cb}");
                break;
            case 'Ï':
                sb.append("{\'cf}");
                break;
            case 'Í':
                sb.append("{\'cd}");
                break;
            case 'Ó':
                sb.append("{\'d3}");
                break;
            case 'Ö':
                sb.append("{\'d6}");
                break;
            case 'Ü':
                sb.append("{\'dc}");
                break;
            case 'Ú':
                sb.append("{\'da}");
                break;
            case 'ß':
            case 'β':
                sb.append("{\'df}");
                break;
            case 'à':
                sb.append("{\'e0}");
                break;
            case 'á':
                sb.append("{\'e1}");
                break;
            case 'ä':
                sb.append("{\'e4}");
                break;
            case 'è':
                sb.append("{\'e8}");
                break;
            case 'é':
                sb.append("{\'e9}");
                break;
            case 'ê':
                sb.append("{\'ea}");
                break;
            case 'ë':
                sb.append("{\'eb}");
                break;
            case 'ï':
                sb.append("{\'ef}");
                break;
            case 'í':
                sb.append("{\'ed}");
                break;
            case 'ò':
                sb.append("{\'f2}");
                break;
            case 'ó':
                sb.append("{\'f3}");
                break;
            case 'ö':
                sb.append("{\'f6}");
                break;
            case 'ú':
                sb.append("{\'fa}");
                break;
            case 'ü':
                sb.append("{\'fc}");
                break;
            default:
                if( chars[i] != ' ' && isSpaceChar( chars[i])) {
                    System.out.print( ".");
                    //sb.append("{\~}");
                    sb.append(" ");
                } else if( chars[i] == 8218) {
                    System.out.println("Strange comma ... ");
                    sb.append(",");
                } else if( chars[i] > 132) {
                    System.err.println( "Special code that is not translated in RTF: '" + chars[i] + "', nummer=" + (int) chars[i]);
                    sb.append(chars[i]);
                } else {
                    sb.append(chars[i]);
                }
        }
    }
    return sb.toString();
}

Specify utf-8 character encoding in RTF? The text (in UTF-8) format is correctly shown in Sqlite

Answers (2)

Related Questions