Reputation: 1321
I'm facing some problems with Jsoup. I'm trying to retrieve an xml file from Open Movie Database using their API in my beta Android app. Their API documentation says if the return type will be an xml just put an "r=" and the return file type. I've tested with some requests. Below one of them:
Jsoup.connect(http://www.omdbapi.com/?i=tt1285016&r=xml).get();
Testing on a browser works fine. But on android no. Any exception is thrown. If I don't insert the return file type, it returns a JSON. In this case I receive the data. To make sure if the problem is with xml file. I've teste with MusicBrainz API. By default it returns XML. For my surprise works fine.
What is the problem? The return type of open movie database of Jsoup?
Upvotes: 4
Views: 1406
Reputation: 10522
Jsoup's primary focus is in dealing with HTML, and making sure that the returned document is well formed HTML. So by default it will always treat the input as HTML and will normalise the document. That's why you're getting a DOM like <html><head></head>...<xml>...</html>
.
If you know the input you're giving it is actually XML, you can configure Jsoup to parse in XML mode. In that case it will not normalise to a HTML DOM, and it will not enforce any HTML spec rules.
As an example:
String url = "http://www.omdbapi.com/?i=tt1285016&r=xml";
Document doc = Jsoup.connect(url)
.parser(Parser.xmlParser())
.get();
System.out.println(doc);
Compare that output with and without the Parser.xmlParser()
configuration:
In XML mode:
<?xml version="1.0" encoding="UTF-8"?>
<root response="True">
<movie title="The Social Network" year="2010" {snip} />
</root>
In HTML mode:
<!--?xml version="1.0" encoding="UTF-8"?-->
<html>
<head></head>
<body>
<root response="True">
<movie title="The Social Network" {snip} />
</root>
</body>
</html>
Upvotes: 2
Reputation: 1321
Found the problem. The values were always there. I don't know why but the return type is an html file with xml tags included. Printing values on Logcat it returns html tags html, head, body and only after this the XML.
Upvotes: 0