user1014917
user1014917

Reputation: 681

Special-characters are displayed as questionmark-hashs

I'm developing applications for android devices and had a problem while developing lately.

I needed to get information out of an html-file online, so I made a construct of InputStream and BufferedReader to actually scan the file for information. I splitted my string to actually get my information and tried displaying it with a toast.

Everything works fine and the way I want it to, but everytime a special-characters should be displayed, a questionmark-hash is.

I think it might be a problem of the charset, because the website say in the :

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

How to I get this right?

EDIT :

HttpClient httpClient = new DefaultHttpClient();
HttpPost post = new HttpPost(url);
((AbstractHttpClient) httpClient).getCredentialsProvider().setCredentials(new AuthScope(null, -1), new UsernamePasswordCredentials("user","password"));
HttpResponse response;
response = httpClient.execute(post);
BufferedReader reader = new BufferedReader(
    new InputStreamReader(
        response.getEntity().getContent()
    )
);
String line = null;
while ((line = reader.readLine()) != null) {
    Toast.makeText(this, line, Toast.LENGTH_LONG).show();
}

Upvotes: 1

Views: 3308

Answers (4)

William T. Mallard
William T. Mallard

Reputation: 1660

Just in case someone else has the same problem I had...

I was getting the same question mark-in-a-black-diamond for text I pulled from a JSON file I'd loaded from res/raw. No matter what sort of stream reading combination I tried, the characters would still appear. My first attempt to ensure I was using UTF-8 was to check the file properties via Eclipse, and sure enough it was set to "MacRoman", whatever that is. I changed it to UTF-8, built, ran, failed, cleaned, built, run, failed, scratched head, came back to SO.

I read that I had to save the file after changing the encoding so I tried that and still no luck. I then finally scrolled down in the JSON file in the Eclipse editor to where the special characters were and interestingly the special characters (é and an emdash) were showing as black diamonds there as well! I deleted and re-entered them and everything worked fine.

Bottom line: encoding matters, and when creating a resource file (XML, JSON, CSV or whatever) make sure you select the proper encoding (usually UTF-8) BEFORE you start entering text.

Upvotes: 0

benteh
benteh

Reputation: 2288

oh, please use utf-8 regardless if this problem is solved elsewhere. http://www.w3.org/TR/html4/charset.html http://en.wikipedia.org/wiki/UTF-8

Upvotes: 0

Giulio Piancastelli
Giulio Piancastelli

Reputation: 15808

InputStreamReader may actually take a Charset as a second parameter, to indicate, I presume, the character encoding of the stream it's going to read. Standard-compliant Java implementations are not required to feature the windows-1252 encoding, but I believe it's quite similar to ISO-8859-1, which you can try as a first workaround to see if it works. There's also another possibly interesting constructor in the InputStreamReader class, taking a CharsetDecoder as a second parameter (you can create one by invoking Charset.newDecoder), which you may try to use to decode the stream in the encoding you prefer, or perhaps in the system's default encoding, that you can obtain by invoking Charset.defaultCharset.

See the JavaDoc API documentation for InputStreamReader, Charset and CharsetDecoder for details. Indeed I'm not an expert and I know just a little about encoding and its issues, but I thought it worth to point out the availability of these classes.

You may also check the encoding used for the InputStreamReader by invoking its getEncoding method.

Upvotes: 2

Jon Skeet
Jon Skeet

Reputation: 1500515

My guess is that you've just used the InputStreamReader constructor which takes a stream but not a character encoding - so it'll try to use the platform default. You should be using the encoding specified in the response; when you're using HTTP the one in the Content-Type header is likely to be okay, although it's a shame that the HTML can specify it separately :(

Now whether Android contains the Windows-1252 encoding is a different matter...

Upvotes: 0

Related Questions