Reputation: 2366
In my Android app, I am scraping some data in an AsyncTask. It works perfectly and Jsoup retrieves the entire document correctly. But when I run the same Jsoup code in a Java console application program, it connects to the ESPN website but doesn't get the entire document because the games
object is always empty (size is always 0). For some reason on the console application the code document.select("section.sb-score");
does not find this data in the html. But in Android it does.
Here is the android code which works fine:
public class NBAScraper extends GameScraper //GameScraper extends AsyncTask
{
public NBAScraper(DateTime date)
{
super(date);
mUrl = "http://www.espn.com/nba/scoreboard/_/date/" + mDateStr; //mDateStr format: yyyyMMdd
}
@Override
protected GameSorterHelper doInBackground(Void... voids)
{
GameSorterHelper gsh = new GameSorterHelper();
try
{
Document document = Jsoup.connect(mUrl).get();
games = document.select("section.sb-score");
if(games.size() == 0)
return null;
} catch (IOException)
{
e.printStackTrace();
return null;
}
//do stuff with gsh object...
return gsh;
}
}
Here is the console application I've tried:
public class Main
{
public static void main(String[] args)
{
String url = "http://www.espn.com/nba/scoreboard/_/date/20170225";
try
{
Document document = Jsoup.connect(url)
.maxBodySize(0)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36")
.get();
Elements games = document.select("section.sb-score");
System.out.println(games.size());
if (games.size() == 0)
System.out.println("games size is 0");
else
System.out.println("games exist");
} catch (Exception e)
{
e.printStackTrace();
}
}
}
As you can see I've tried setting the maxBodySize
to 0 which allows any document download size and setting the userAgent
. Neither fixes it. Of course I've also tried it without the two options set but that doesn't work either.
Does anyone know why this is occurring and how I can get it to work on the console application? Thank you!
Upvotes: 0
Views: 118
Reputation: 6171
Looks like it has something to do with the userAgent
string. I had to use the following (Android UA) in order to get it to work from my PC: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
Upvotes: 2