OBX
OBX

Reputation: 6114

How to retrieve a specific table from webpage using Jsoup [ Android ]

I am trying to retrieve a table from this URL . This is the table I need to retrieve :

 <table id="h2hSum" class="competitionRanking tablesorter"> 
              <thead> 
               <tr> 
                <th align="center">Team</th> 
                <th align="center">Played</th> 
                <th align="center">Win</th> 
                <th align="center">Draw</th> 
                <th align="center">Lose</th> 
                <th align="center">Score</th> 
                <th>Goals Scored</th> 
                <th>Goals Allowed</th> 
               </tr> 
              </thead> 
              <tbody> 
               <tr> 
                <td><a class="teamLink" href="/soccer-statistics/England/Premier-League-2016-2017/team_info_overall/676_Manchester_City_FC">Manchester City</a></td> 
                <td>140</td> 
                <td>47</td> 
                <td>38</td> 
                <td>55</td> 
                <td>188:205</td> 
                <td>1.34</td> 
                <td>1.46</td> 
               </tr> 
               <tr class="odd"> 
                <td><a class="teamLink" href="/soccer-statistics/England/Premier-League-2016-2017/team_info_overall/661_Chelsea_FC">Chelsea</a></td> 
                <td>140</td> 
                <td>55</td> 
                <td>38</td> 
                <td>47</td> 
                <td>205:188</td> 
                <td>1.46</td> 
                <td>1.34</td> 
               </tr> 
              </tbody> 
             </table>

This is what I tried :

private class SimpleTask1 extends AsyncTask<String, String, String>
{
    ProgressDialog loader;


    @Override
    protected void onPreExecute()
    {
        loader = new ProgressDialog(MainActivity.this, ProgressDialog.STYLE_SPINNER);
        loader.setMessage("loading engine");
        loader.show();

    }

    protected String doInBackground(String... urls)
    {
        String result1 = "";
        try {

            Document doc = Jsoup.connect(urls[0]).get();
            Element table = doc.select("table[class=competitionRanking tablesorter]").first();
            Iterator<Element> ite = table.select("td").iterator();

            ite.next();
            Log.w("Value 1: ",""+ ite.next().text());
            Log.w("Value 2: ",""+ ite.next().text());
            Log.w("Value 3: ",""+ ite.next().text());
            Log.w("Value 4: ",""+ ite.next().text());

        } catch (IOException e) {

        }
        return result1;
    }

    protected void onPostExecute(String sampleVal)
    {
        loader.dismiss();
        Log.e("OUTPUT",""+sampleVal);



    }




}

However, this throws Exception, I tried similar answers, but the answers differ as the tables are accessed using their class name or td width. What should I do so that I can access all the values in this table? Kindly help.

Upvotes: 1

Views: 283

Answers (2)

Frederic Klein
Frederic Klein

Reputation: 2876

Problem

Iterator<Element> ite = table.select("td").iterator(); throws a NullPointerException

Reason

After the initial visit to the site they seem to store your ip and request registration on second visit if your activity was similar to a bot. The landing page you are being redirected to doesn't contain the table, so table is null and you can't call select(...) on null.

Solution

Register for the service and insert the login procedure to your code or use proxies to switch ip address if you are redirected to the registration page. Not sure how long an ip gets blocked, but using vpn and the following code I had no problems doing 20 consecutive queries. So make sure to set a user-agent, cookies and other header fields that are contained in the original site request (e.g. monitor with developer tools/network tools in browser):

Code

String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";
Response res = Jsoup
        .connect("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/")
        .followRedirects(true).userAgent(userAgent).referrer("http://www.soccerpunter.com")
        .method(Method.GET).header("Host", "http://www.soccerpunter.com").execute();

Document doc = Jsoup
        .connect("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/head_to_head_statistics/all/676_Manchester_City_FC/661_Chelsea_FC")
        .userAgent(userAgent).timeout(10000).header("Host", "http://www.soccerpunter.com")
        .cookies(res.cookies())
        .referrer("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/")
        .get();

Elements td = doc.select("table.competitionRanking.tablesorter").first().select("td");

Upvotes: 1

soorapadman
soorapadman

Reputation: 4509

Try this:

Document document = Jsoup.parse(s);
        Element table =  document.select("table[class=competitionRanking tablesorter]").first();
        for (Element element:table.select("tr")){
            for (Element td:element.select("td")){
                System.out.println(td.text());
            }
        }

Upvotes: 0

Related Questions