user3388473
user3388473

Reputation: 963

How to convert Shift-JIS encoded string to UTF-8?

I am getting html source from Aozora Bunko. Html file is Shift-JIS encoded. I am trying to get book title and author. Then I want to record title and author into SQLite(UTF-8) database.

    String[] splittedResult = result.split("\"title\">");
            splittedResult = splittedResult[1].split("</h1>");
            String title = splittedResult[0];
            byte[] b = null;
            try {
                b = title.getBytes("Shift_JIS");
            } catch (UnsupportedEncodingException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }
            String value=null;
            try {
                value = new String(b, "UTF-8");
            } catch (UnsupportedEncodingException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }

...
myDatabase.addBookInformation(value, author);

Result is like this: latin letters are showing normally. But japanese letters are shown by blocks question mark inside (please do not pay attention to null values)

enter image description here

How to solve this problem?

Upvotes: 1

Views: 4417

Answers (1)

user3388473
user3388473

Reputation: 963

As @Codo pointed out, solution for this problem was before. I changed this

s = EntityUtils.toString(response.getEntity(), "UTF-8");

to this

s = EntityUtils.toString(response.getEntity(), "Shift_JIS");

And now there is no need for encoding.

String[] splittedResult = result.split("\"title\">");
        splittedResult = splittedResult[1].split("</h1>");
        String title = splittedResult[0];
        /** I HAVE TAKEN THIS PART OF MY CODE
        byte[] b = null;
        try {
            b = title.getBytes("Shift_JIS");
        } catch (UnsupportedEncodingException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        String value=null;
        try {
            value = new String(b, "UTF-8");
        } catch (UnsupportedEncodingException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        **/

Upvotes: 1

Related Questions