Reputation: 438
I have an XML File containing some Arabic Characters retrieved from a URL so I had to encode it in UTF-8 so it can handle such characters.
XML File:
<Entry>
<lstItems>
<item>
<id>1</id>
<title>News Test 1</title>
<subtitle>16/7/2012</subtitle>
<img>joelle.mobi-mind.com/imgs/news1.jpg</img>
</item>
<item>
<id>2</id>
<title>كريم</title>
<subtitle>16/7/2012</subtitle>
<img>joelle.mobi-mind.com/imgs/news2.jpg</img>
</item>
<item>
<id>3</id>
<title>News Test 333</title>
<subtitle>16/7/2012</subtitle>
<img>joelle.mobi-mind.com/imgs/news3.jpg</img>
</item>
<item>
<id>4</id>
<title>ربيع</title>
<subtitle>16/7/2012</subtitle>
<img>joelle.mobi-mind.com/imgs/cont20.jpg</img>
</item>
<item>
<id>5</id>
<title>News Test 55555</title>
<subtitle>16/7/2012</subtitle>
<img>joelle.mobi-mind.com/imgs/cont21.jpg</img>
</item>
<item>
<id>6</id>
<title>News Test 666666</title>
<subtitle>16/7/2012</subtitle>
<img>joelle.mobi-mind.com/imgs/cont22.jpg</img>
</item>
</lstItems>
</Entry>
I parsed the XML retrieved from a URL it as String as shown below:
public String getXmlFromUrl(String url) {
try {
return new AsyncTask<String, Void, String>() {
@Override
protected String doInBackground(String... params) {
//String xml = null;
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpGet httpPost = new HttpGet(params[0]);
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");
} catch (Exception e) {
e.printStackTrace();
}
return xml;
}
}.execute(url).get();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return xml;
}
Now the returned String is passed to this method to get a Document for later use as shown below:
public Document getDomElement(String xml){
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
StringReader xmlstring=new StringReader(xml);
is.setCharacterStream(xmlstring);
is.setEncoding("UTF-8");
//Code Stops here !
doc = db.parse(is);
} catch (ParserConfigurationException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (SAXException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (IOException e) {
Log.e("Error: ", e.getMessage());
return null;
}
// return DOM
return doc;
}
an Error ocured with this message:
09-18 07:51:40.441: E/Error:(1210): Unexpected token (position:TEXT @1:4 in java.io.StringReader@4144c240)
So the code crashes where I showed above with the following Error
09-18 07:51:40.451: E/AndroidRuntime(1210): java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.university1/com.example.university1.MainActivity}: java.lang.NullPointerException
Kindly note that the code works fine with ISO encoding.
Upvotes: 0
Views: 4971
Reputation: 32407
This might not be the problem, but EntityUtils.toString(httpEntity).getBytes()
is using the default platform encoding. You should use EntityUtils.toString(httpEntity)
as the String
, no need to turn it into bytes.
Also, read this http://kunststube.net/encoding/ for useful background on what's going on.
Upvotes: 1
Reputation: 382102
You've added a BOM in your UTF-8 file. Which is bad.
Maybe you edited your file with Notepad, or maybe you should check your editor to ensure it doesn't add a BOM.
As the BOM seems to be inside the text and not at start, you also need to remove it by using the delete key around its position (it's invisible in most editors). This may have happened during a file concatenation operation.
Upvotes: 2