Reputation: 353
I faced with cloudflare issue when I tried to parse the website.
I got this code
import cloudscraper
url = "https://author.today"
scraper = cloudscraper.create_scraper()
print(scraper.post(url).status_code)
This code prints me
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.
I searched for workaround, but couldn't find any solution. If visit the website via a browser you could see
Checking your browser before accessing author.today.
Is there any solution to bypass cloudflare in my case?
Upvotes: 19
Views: 42249
Reputation: 77
I can suggest such workflow to "try" to avoid Cloudflare WAF/bot mitigation:
Source: I use Cloudflare with hundreds of domains and thousands of records (Enterprise) from the beginning of the company.
That way you will be closer to the point (and you will help them increasing the overall security).
Upvotes: 2
Reputation: 148
Install httpx
pip3 install httpx[http2]
Define http2 client
client = httpx.Client(http2=True)
Make request
response = client.get("https://author.today")
Cheers!
Upvotes: 6
Reputation: 104
I'd try to create a Playwright scraper that mimics a real user, this works for me most of the time, just need to find the right settings (they can vary from website to website). Otherwise, if the website has a native App, try to figure out how the App behaves and then mimic it.
Upvotes: 0
Reputation: 93
I used this line: scraper = cloudscraper.create_scraper(browser={'browser': 'chrome','platform': 'windows','mobile': False})
and then used httpx package after that with httpx.Client() as s: //Remaining Code
And I was able to bypass the issue cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.
Upvotes: -4
Reputation: 7
import cfscrape
from fake_useragent import UserAgent
ua = UserAgent()
s = cfscrape.create_scraper()
k = s.post("https://author.today", headers = {"useragent": f"{ua.random}"})
print(k)
Upvotes: -1
Reputation: 1639
Although for this site is does not seem to work, sometimes adding some parameters when initializing the scraper helps:
import cloudscraper
url = "https://author.today"
scraper = cloudscraper.create_scraper(
browser={
'browser': 'chrome',
'platform': 'android',
'desktop': False
}
)
print(scraper.post(url).status_code)
Upvotes: 1