Eli
Eli

Reputation: 262

JSoup skipping elements Android

JSoup seems to be skipping some elements in my HTML string. I am 100% positive everything is in the HTML String, but JSoup is only reading some of the elements when I select them to be parsed, or none at all. But I know they exist. Here is my code: Thanks:

public void parseDoc() {
    final HttpParams params = new BasicHttpParams();
    HttpClientParams.setRedirecting(params, true);
    HttpClient httpclient = new DefaultHttpClient();
    HttpPost httppost = new HttpPost(
            "https://secure.groupfusion.net/processlogin.php");
    String HTML = "";
    try {
        List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(3);
        nameValuePairs.add(new BasicNameValuePair("referral_page",
                "/modules/gradebook/ui/gradebook.phtml?type=student_view"));
        nameValuePairs.add(new BasicNameValuePair("currDomain",
                "beardenhs.knoxschools.org"));
        nameValuePairs.add(new BasicNameValuePair("username", username
                .getText().toString()));
        nameValuePairs.add(new BasicNameValuePair("password", password
                .getText().toString()));
        httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));

        HttpResponse response = httpclient.execute(httppost);

        HTML = EntityUtils.toString(response.getEntity());
        Document doc = Jsoup.parse(HTML);
        Element link = doc.select("a").first();
        String linkHref = link.attr("href");
        HttpGet request = new HttpGet();
        try {
            request.setURI(new URI(linkHref));
        } catch (URISyntaxException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        response = httpclient.execute(request);
        String html = "";
        InputStream in = response.getEntity().getContent();
        BufferedReader reader = new BufferedReader(
                new InputStreamReader(in));
        StringBuilder str = new StringBuilder();
        String line = null;
        while ((line = reader.readLine()) != null) {
            str.append(line);
        }
        in.close();
        HTML = str.toString();
        doc = Jsoup.parse(HTML);
        Elements divs = doc.select("div.yuiTop");
        for (Element d: divs) {
            sting.append(d.text());
            sting.append("\n");
        }


    } catch (ClientProtocolException e) {
    } catch (IOException e) {
    }

}

Upvotes: 1

Views: 1034

Answers (1)

Neil Traft
Neil Traft

Reputation: 19726

Kind of weird code here... so I gather that this is making a query, which will return a list of hyperlinks as a result, and you are screen-scraping the result for the first hyperlink, then you are trying to load the content of that second link? Well, are you sure the server is returning a valid hyperlink? Try loading the page in your browser.

If it's valid then I'm not sure what the issue is, but why wouldn't you use WebView.loadUrl(), and let the browser component take care of it?

Upvotes: 1

Related Questions