Reputation: 91
So, I try to parse Wikipedia, and my code works well at computer. All, what I changed - .connect().get is in AsyncTask, but I get only part of html file (no "body", only half of second "script" in "title") and I can't understand why. This is my code example for Android.
protected String doInBackground(String... params) {
try {
Document doc = Jsoup.connect(params[0]).get();
return doc.toString();
} catch (IOException e) {
//...
e.printStackTrace();
}
return null;
}
And this is simple.
String url = "https://en.wikipedia.org/wiki/Protectorate";
Document doc = null;
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
//...
e.printStackTrace();
}
I checked, params[0] is https://en.wikipedia.org/wiki/Protectorate, here's no mistake. If you need some extra information, I will give it, of course.
Upvotes: 1
Views: 697
Reputation: 2875
Logcat fools us here, since it shortens the message (I assume you checked your string with logcat? See related question)
If you split your result string into chunks, you will see that the whole page was loaded. Try adding something like this logAll
function to your AsyncTask class to see the full output:
private class DownloadTask extends AsyncTask<String, Integer, String> {
Document doc = null;
protected String doInBackground(String... params) {
try {
doc = Jsoup.connect(params[0]).get();
return doc.toString();
} catch (Exception e) {
e.printStackTrace();
}
return doc.toString();
}
@Override
protected void onPostExecute(String s) {
super.onPostExecute(s);
logAll("async",doc.toString());
}
void logAll(String TAG, String longString) {
int splitSize = 300;
if (longString.length() > splitSize) {
int index = 0;
while (index < longString.length()-splitSize) {
Log.e(TAG, longString.substring(index, index + splitSize));
index += splitSize;
}
Log.e(TAG, longString.substring(index, longString.length()));
} else {
Log.e(TAG, longString.toString());
}
}
}
Upvotes: 1