Vic
Vic

Reputation: 93

Bing Search with Jsoup - how can I avoid captcha?

keywordexist = false;
try {
    res = Jsoup
            .connect(
                    bingSearchUrl.replaceAll("keyword", "intitle:\""
                            + keyword + "\""))
            .userAgent(
                    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.15 (KHTML, like Gecko) Chrome/24.0.1295.0 Safari/537.15")
            .referrer("http://www.bing.com")
            .method(Connection.Method.GET).execute();
    doc = res.parse();
    System.out.println(bingSearchUrl.replaceAll("keyword", "intitle:\""
            + keyword + "\""));
    elements = doc.select("li[class^=b_algo]");
    System.out.println(doc.html());
    System.out.println(elements.html());
    // String divContents =
    // doc.select(".id-app-orig-desc").first().text();
    // elements.remove("div");
    if (elements.html().contains("<strong>" + keyword + "</strong>")) {
        keywordexist = true;
        System.out.println("keyword exists");
    }
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

I'm trying to use jsoup to check a list of keywords I have in Bing Search but whenever I run my program jsoup will always connect to Bing's captcha page, is there any way I can avoid this? I thought this would be remedied by adding a useragent and referrer but it doesn't seem to have any effect.

Upvotes: 3

Views: 1445

Answers (1)

Stephan
Stephan

Reputation: 43013

I used a code similar to yours and get all the results. However here are two points I noticed:

  • I think you should slow down between two searches. For example, add a random pause from 3000 to 5000 ms.

  • Don't forget to escape the query parameters

SAMPLE CODE

String bingSearchUrl = "http://www.bing.com/search?q=keyword";
String keyword = "stackoverflow jsoup";

String uaString = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.15 (KHTML, like Gecko) Chrome/24.0.1295.0 Safari/537.15";
String url = bingSearchUrl.replaceAll("keyword", URLEncoder.encode("intitle:\"" + keyword + "\"", "UTF-8"));

Document doc = Jsoup.connect(url).userAgent(uaString).get();

System.out.println(doc.select("li h2"));

Upvotes: 2

Related Questions