Pableras84
Pableras84

Reputation: 1195

How to read a InputStream with UTF-8?

Welcome all

I'm developing a Java app, that calls a PHP from internet that it's giving me a XML response.

In the response is contained this word: "Próximo", but when i parse the nodes of the XML and obtain the response into a String variable, I'm receiving the word like this: "Pr& oacute;ximo".

I'm sure that the problem is that i'm using different encoding in the Java app then encoding of PHP script. Then, i supose i must set encoding to the same as in your PHP xml, UTF-8

This is the code i'm using to geat the XML file from the PHP.

¿What should i change in this code to set the encoding to UTF-8? (note that im not using bufered reader, i'm using input stream)

        InputStream in = null;
        String url = "http://www.myurl.com"
        try {                              
            URL formattedUrl = new URL(url); 
            URLConnection connection = formattedUrl.openConnection();   
            HttpURLConnection httpConnection = (HttpURLConnection) connection;
            httpConnection.setAllowUserInteraction(false);
            httpConnection.setInstanceFollowRedirects(true);
            httpConnection.setRequestMethod("GET");
            httpConnection.connect();               
            if (httpConnection.getResponseCode() == HttpURLConnection.HTTP_OK)
                in = httpConnection.getInputStream();   

            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();                     
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse(in);
            doc.getDocumentElement().normalize();             
            NodeList myNodes = doc.getElementsByTagName("myNode"); 

Upvotes: 8

Views: 36729

Answers (1)

Jon Lin
Jon Lin

Reputation: 143876

When you get your InputStream read byte[]s from it. When you create your Strings, pass in the CharSetfor "UTF-8". Example:

byte[] buffer = new byte[contentLength];
int bytesRead = inputStream.read(buffer);
String page = new String(buffer, 0, bytesRead, "UTF-8");

Note, you're probably going to want to make your buffer some sane size (like 1024), and continuously called inputStream.read(buffer).


@Amir Pashazadeh

Yes, you can also use an InputStreamReader, and try changing the parse() line to:

Document doc = db.parse(new InputSource(new InputStreamReader(in, "UTF-8")));

Upvotes: 9

Related Questions