thevoipman
thevoipman

Reputation: 1833

Unable to parse UTF-8 XML

My external XML already have

<?xml version="1.0" encoding="UTF-8"?>

However, when I try to parse it in my application, it doesn't read the Unicode at ALL!

Here is what I have done and still no luck.

private class MyDownloadTask extends AsyncTask<Void,Void,Void>
{
    String URL = context.getResources().getString(R.string.XML_database_url);
    String KEY_ITEM = "item"; // parent node
    String KEY_NAME = "name";
    String KEY_COST = "location";
    String KEY_DESC = "url";
    ArrayList<RadioListElement> radioArray;

    protected void onPreExecute(final ArrayList<String> userRadios) {
        super.onPreExecute();
        radioArray = new ArrayList<RadioListElement>();
        MainActivity.getDataManager().loadStoredRadioStations(radioArray, userRadios);
    }

    protected Void doInBackground(Void... params) {
        String xml = getXmlFromUrl(URL);
        Document doc = getDomElement(xml);

        NodeList nl = doc.getElementsByTagName(KEY_ITEM);
        for (int i = 0; i < nl.getLength(); i++) {
            Element e = (Element) nl.item(i);
            String name = getValue(e, KEY_NAME);
            String cost = getValue(e, KEY_COST);
            String description = getValue(e, KEY_DESC);
            radioArray.add(new RadioListElement(context, name, cost, description));
        }
        return null;
}

public Document getDomElement(String xml){
        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        try {

            DocumentBuilder db = dbf.newDocumentBuilder();

            InputSource is = new InputSource(is,"UTF-8");
            is.setCharacterStream(new StringReader(xml));

            doc = db.parse(is);

        } catch (ParserConfigurationException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (SAXException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (IOException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        }
        // return DOM
        return doc;
    }

I put the UTF-8 here

                InputSource is = new InputSource(is,"UTF-8");

What am I doing wrong? How can I make this work so that it displays Unicode just fine for me?

Upvotes: 0

Views: 1098

Answers (2)

Santhosh Kumar Tekuri
Santhosh Kumar Tekuri

Reputation: 3020

do not try to convert xml to string your self and try to feed string to dom parser. the xml parsers are intelligent to interpret encoding them self.

I suggest to change getXmlFromUrl(String url) to return InputStream from httpEntity as below:

return httpEntity.getContent()

to give this InputStream to DOM parser as below:

InputSource is = new InputSource(inputStream);

Note that no encoding is set in is

now parse this is and verify that it parses unicode as expected

Upvotes: 1

thevoipman
thevoipman

Reputation: 1833

I added utf-8 into the code that grab the xml from the url. should look like this:

xml = EntityUtils.toString(httpEntity,"utf-8");

public String getXmlFromUrl(String url) {
    String xml = null;
    try {
        DefaultHttpClient httpClient = new DefaultHttpClient();
        HttpPost httpPost = new HttpPost(url);

        HttpResponse httpResponse = httpClient.execute(httpPost);
        HttpEntity httpEntity = httpResponse.getEntity();
        xml = EntityUtils.toString(httpEntity,"utf-8");

    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    } catch (ClientProtocolException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return xml;
}

Upvotes: 0

Related Questions