Reputation: 155
I've been using jsoup connect method for getting DOM of certain websites for some time (made my personal bot and I make 20-30 request per day to those websites). Namely I can open and browse that website but my java program can't access it since today, one thing I noticed changed is that CloudFlare is checking my browser (prevention of DDoS attacks) . My connect code looks like this
doc = Jsoup.connect(url)
.userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.referrer("http://www.google.com")
.timeout(0)
.get();
and now I get error 503. I tried changing userAgent to only "Mozzila/5.0" and than I get error 403. Doesn't make any sense to my, but my suspicion is on the Cloudflare system.
Edit:
I discovered that CloudFlare protection "I'am under attack" requires for browser to have JavaScript and Cookies on and grants access to website after 5 seconds. How can I recreate that situation with my Java program?
Upvotes: 0
Views: 1906
Reputation: 461
Every website has its limitation to avoid crash or attack. It happens to me when I want to access github data. I did not see any authentication in your code (you may hide it, which I can understand). Sometimes they will give you higher access limitation with higher frequency. So try give authentication is good.
Another problem is that you set timeout to 0. ConnectionTimeout=0 is bad, make it something reasonable like 30 seconds.
Upvotes: 1