Reputation: 21
I want to scrape this site
but it has captcha protection.
There is some way to mark the button:
"I'm not a robot" with Python Scrapy?
Upvotes: 0
Views: 1873
Reputation:
This happens when you make frequent request to a webpage. Scrapy is not a browser automation tool. It just requests a page and parses html. In your problem if you want to fill captcha programmatically you can use selenium. But that is so heavy and a burden on RAM.
The solution is to use proxy or user agent rotation . For example:-
user-agents=['mozilla 1/0', 'googlebot']
And choose random user-agent like:-
random_agent=random.choice(user_agent)
Now you use the generated user agent while requesting a page.
Scrapy also provide many middlewares for this purpose. https://doc.scrapy.org/en/1.4/topics/spider-middleware.html
List of user agents:- https://deviceatlas.com/blog/list-of-user-agent-strings
Web crawlers uses such techniques Cheers!
Upvotes: 2