Jsoup.connect().get() takes only part of html file on Android

Question

So, I try to parse Wikipedia, and my code works well at computer. All, what I changed - .connect().get is in AsyncTask, but I get only part of html file (no "body", only half of second "script" in "title") and I can't understand why. This is my code example for Android.

protected String doInBackground(String... params) {
        try {
            Document doc = Jsoup.connect(params[0]).get();
            return doc.toString();
        } catch (IOException e) {
            //...
            e.printStackTrace();
        }
        return null;
    }

And this is simple.

String url = "https://en.wikipedia.org/wiki/Protectorate";
    Document doc = null;
    try {
        doc = Jsoup.connect(url).get();
    } catch (IOException e) {
        //...
        e.printStackTrace();
    }

I checked, params[0] is https://en.wikipedia.org/wiki/Protectorate, here's no mistake. If you need some extra information, I will give it, of course.

Frederic Klein · Accepted Answer

Logcat fools us here, since it shortens the message (I assume you checked your string with logcat? See related question)

If you split your result string into chunks, you will see that the whole page was loaded. Try adding something like this logAll function to your AsyncTask class to see the full output:

private class DownloadTask extends AsyncTask {

    Document doc = null;

    protected String doInBackground(String... params) {

        try {
            doc = Jsoup.connect(params[0]).get();
            return doc.toString();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return doc.toString();
    }

    @Override
    protected void onPostExecute(String s) {
        super.onPostExecute(s);
        logAll("async",doc.toString());
    }

    void logAll(String TAG, String longString) {

        int splitSize = 300;

        if (longString.length() > splitSize) {
            int index = 0;
            while (index < longString.length()-splitSize) {
                Log.e(TAG, longString.substring(index, index + splitSize));
                index += splitSize;
            }
            Log.e(TAG, longString.substring(index, longString.length()));
        } else {
            Log.e(TAG, longString.toString());
        }
    }
}

Jsoup.connect().get() takes only part of html file on Android

Answers (1)

Related Questions