Pete
Pete

Reputation: 514

Jaunt webcrawler - cannot visit next page of Google search results

import com.jaunt.*;
public class JauntCrawler{
  public static void main(String[] args){
    try{
        UserAgent userAgent = new UserAgent();         //create new userAgent (headless browser)
        userAgent.visit("http://google.de");          //visit google
        userAgent.doc.apply("schmetterlinge");            //apply form input (starting at first editable field)
        userAgent.doc.submit();         //click submit button labelled "Google Search"


        Elements links = userAgent.doc.findEvery("<h3 class=r>").findEvery("<a>");  //find search result links
        for(Element link : links) System.out.println(link.getAt("href"));           //print results

        if(userAgent.doc.nextPageLinkExists()) {
            userAgent.visit(userAgent.doc.nextPageLink().getHref());
            Elements newlinks = userAgent.doc.findEvery("<h3 class=r>").findEvery("<a>");
            System.out.println("\nPage 2:");
            for(Element link : newlinks) System.out.println(link.getAt("href"));
        }
    }
    catch(JauntException e){         //if an HTTP/connection error occurs, handle JauntException.
      System.err.println(e);
    }
  }
}

I want to return more search results from Google than just the first page. So the second for-loop should basically return the results of the next page, but it doesn't. Any idea why?

Upvotes: 1

Views: 1054

Answers (1)

Zeeshan Amber
Zeeshan Amber

Reputation: 151

I also came across the same problem. user agent is not going to the next page but I found another way to achieve this :

Elements nextLinks = userAgent.doc.findEvery("<a class=fl");
        for(int i=0;i<nextLinks.size();i++) {
            userAgent.visit("http://google.co.in/search?q="+<search_string+"&start="+(i+1)*10);
            links = userAgent.doc.findEvery("<h3 class=r>").findEvery("<a>"); 
            for(Element link : links) System.out.println(link.getAt("href"));
        }

Upvotes: 1

Related Questions