Reputation: 960
I have made a web-scraper for Google Scholar in Java with JSoup. The scraper search Scholar for a DOI and finds the citations for this paper. This data is needed for a research.
But, the scraper only works for the first requests. .. After that the scraper encounters a captcha on the Scholar site.
However, when I open the website in my browser (Chrome) Google Scholar opens normally.
How is this possible? All request come from the same IP-address! So far I have tried the following options:
When I analyse the request made by Chrome to Scholar I see that a cookie is used with some session ID's. Probably this is why Chrome requests are not blocked. Is it possible to use this cookie for request made with JSoup?
Thank you!
Upvotes: 1
Views: 2750
Reputation: 612
There's three things that spring to mind:
I think the first is the most likely option. You should try copy as many of the headers you see in the request from Chrome into your java code.
Upvotes: 2