droidgren
droidgren

Reputation: 7118

How do I convert a string to UTF-8 in Android?

I am using a HTML parser called Jsoup, to load and parse HTML files. The problem is that the webpage I'm scraping is encoded in ISO-8859-1 charset while Android is using UTF-8 encoding(?). This is results in some characters showing up as question marks.

So now I guess I should convert the string to UTF-8 format.

Now I have found this Class called CharsetEncoder in the Android SDK, which I guess could help me. But I can't figure out how to implement it in practice, so I wonder if could get som help with by a practical example.

UPDATE: Code to read data (Jsoup)

url = new URL("http://www.example.com");
Document doc = Jsoup.parse(url, 4000);

Upvotes: 5

Views: 23788

Answers (2)

droidgren
droidgren

Reputation: 7118

Byte encodings and Strings

public static void main(String[] args) {

      System.out.println(System.getProperty("file.encoding"));
      String original = new String("A" + "\u00ea" + "\u00f1"
                                 + "\u00fc" + "C");

      System.out.println("original = " + original);
      System.out.println();

      try {
          byte[] utf8Bytes = original.getBytes("UTF8");
          byte[] defaultBytes = original.getBytes();

          String roundTrip = new String(utf8Bytes, "UTF8");
          System.out.println("roundTrip = " + roundTrip);

          System.out.println();
          printBytes(utf8Bytes, "utf8Bytes");
          System.out.println();
          printBytes(defaultBytes, "defaultBytes");
      } catch (UnsupportedEncodingException e) {
          e.printStackTrace();
      }

   } // main

Upvotes: 4

Al Sutton
Al Sutton

Reputation: 3924

You can let Android do the work for you by reading the page into a byte[] and then using the jSoup methods for parsing String objects.

Don't forget to specify the encoding when you create the string from the data read from the server using the correct String constructor.

Upvotes: 6

Related Questions