Reputation: 2419
I'm trying to use Jsoup in order to scrape the following url:
http://translink.com.au//travel-information/service-notices/25611/details
I used the following query #content-left-column > div.content
but the results are inconsistent.
Sometimes I get no results, and sometimes I get the required results.
public class JsoupSelectorMain {
public static Elements getAlertsElements(Document document , String query)
{
return document.select(query);
}
public static void main(String args[]) throws ParseException {
Document doc = null;
try {
doc = Jsoup.connect("http://translink.com.au//travel-information/service-notices/25611/details").get();
} catch (IOException e) {
e.printStackTrace();
}
String str="#content-left-column > div.content";
Elements element = getAlertsElements(doc, str);
for(int i=0 ; i<element.size() ; i++){
System.out.println(element.get(i).toString());
System.out.println();
}
System.out.println("size=" + element.size());
}
}
I used timeout(0)
but it is not the issue. I also checked Jsoup known issues but couldn't find similar cases.
What i'm missing here?
Upvotes: 1
Views: 228
Reputation: 3159
I think its because the site detects it as a mobile user agent and perhaps that's what causing the inconsistency in your results. I created a new project on eclipse and in debug mode I found that the URL was changed to http://mobile.translink.com.au//travel-information/service-notices/25611/details
And then I changed this statement:
doc = Jsoup.connect("http://translink.com.au//travel-information/service-notices/25611/details").timeout(0).get();
To this:
doc = Jsoup.connect("http://translink.com.au//travel-information/service-notices/25611/details").timeout(0).userAgent("Chrome").get();
...So that it can detect it as Non-mobile/Desktop UA.
Upvotes: 1