David Velasquez
David Velasquez

Reputation: 2366

Jsoup gets all html data on Android app but not on Java console application

In my Android app, I am scraping some data in an AsyncTask. It works perfectly and Jsoup retrieves the entire document correctly. But when I run the same Jsoup code in a Java console application program, it connects to the ESPN website but doesn't get the entire document because the games object is always empty (size is always 0). For some reason on the console application the code document.select("section.sb-score"); does not find this data in the html. But in Android it does.

Here is the android code which works fine:

public class NBAScraper extends GameScraper  //GameScraper extends AsyncTask
{
    public NBAScraper(DateTime date)
    {
        super(date);
        mUrl = "http://www.espn.com/nba/scoreboard/_/date/" + mDateStr; //mDateStr format: yyyyMMdd
    }

    @Override
    protected GameSorterHelper doInBackground(Void... voids)
    {
        GameSorterHelper gsh = new GameSorterHelper();
        try
        {
            Document document = Jsoup.connect(mUrl).get();
            games = document.select("section.sb-score");
            if(games.size() == 0)
                return null;
        } catch (IOException)
        {
            e.printStackTrace();
            return null;
        }

        //do stuff with gsh object...
        return gsh;
    }
}

Here is the console application I've tried:

public class Main
{
    public static void main(String[] args)
    {
        String url = "http://www.espn.com/nba/scoreboard/_/date/20170225";
        try
        {
            Document document = Jsoup.connect(url)
                    .maxBodySize(0)
                    .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36")
                    .get();
            Elements games = document.select("section.sb-score");
            System.out.println(games.size());

            if (games.size() == 0)
                System.out.println("games size is 0");
            else
                System.out.println("games exist");

        } catch (Exception e)
        {
            e.printStackTrace();
        }
    }
}

As you can see I've tried setting the maxBodySize to 0 which allows any document download size and setting the userAgent. Neither fixes it. Of course I've also tried it without the two options set but that doesn't work either.

Does anyone know why this is occurring and how I can get it to work on the console application? Thank you!

Upvotes: 0

Views: 118

Answers (1)

TDG
TDG

Reputation: 6171

Looks like it has something to do with the userAgent string. I had to use the following (Android UA) in order to get it to work from my PC: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30

Upvotes: 2

Related Questions