Pthomas
Pthomas

Reputation: 355

Python - Request being blocked by Cloudflare

I am trying to log into a website. When I look at print(g.text) I am not getting back the web page I expect but instead a cloudflare page that says 'Checking your browser before accessing'

import requests
import time

s = requests.Session()
s.get('https://www.off---white.com/en/GB/')

headers = {'Referer': 'https://www.off---white.com/en/GB/login'}

payload = {
    'utf8':'✓',
    'authenticity_token':'',
    'spree_user[email]': '[email protected]',
    'spree_user[password]': 'PASSWORD',
    'spree_user[remember_me]': '0',
    'commit': 'Login'
}

r = s.post('https://www.off---white.com/en/GB/login', data=payload, headers=headers)

print(r.status_code)

g = s.get('https://www.off---white.com/en/GB/account')

print(g.status_code)
print(g.text)

Why is this occurring when I have set the session?

Upvotes: 29

Views: 78964

Answers (6)

Feyiz Pekel
Feyiz Pekel

Reputation: 1

I used scrapper libraries but it did not work, however, I got the IP address and added it into C:\Windows\System32\drivers\etc\hosts like 123.12.12.123 www.example.com server.example.com and save it. I then used the curl command translated to Python code and got a correct response.

Upvotes: -2

Praveen Kumar
Praveen Kumar

Reputation: 959

You can scrape any Cloudflare protected page by using this tool. Node.js is mandatory in order for the code to work correctly.

Download Node from this link https://nodejs.org/en/

import cfscrape #pip install cfscrape

scraper = cfscrape.create_scraper()
res = scraper.get("https://www.example.com").text
print(res)

Upvotes: 5

whoami
whoami

Reputation: 51

curl and hx avoid this problem. But how? I found, they work by default with HTTP/2. But requests library used only HTTP/1.1.

So, for tests I installed httpx with h2 python library to support HTTP/2 requests) and it works if I do: httpx --http2 'https://some.url'.

So, the solution is to use a library that supports http2. For example httpx with h2

It's not a complete solution, since it won't help to solve Cloudflare's anti-bot ("I'm Under Attack Mode", or IUAM) challenge

Upvotes: 5

Alvaro G
Alvaro G

Reputation: 61

I had the same problem because they implemented cloudfare in the api, I solved it this way

import cloudscraper
import json
scraper = cloudscraper.create_scraper()
r = scraper.get("MY API").text 
y = json.loads(r)
print (y)

Upvotes: 5

Eiri
Eiri

Reputation: 755

You might want to try this:

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper()  # CloudScraper inherits from requests.Session
print scraper.get("http://somesite.com").text  # => "<!DOCTYPE html><html><head>..."

It does not require Node.js dependency. All credits go to this pypi page

Upvotes: 48

Jeremiah
Jeremiah

Reputation: 633

This is due to the fact that the page uses Cloudflare's anti-bot page (or IUAM).
Bypassing this check is quite difficult to solve on your own, since Cloudflare changes their techniques periodically. Currently, they check if the client supports JavaScript, which can be spoofed.
I would recommend using the cfscrape module for bypassing this.
To install it, use pip install cfscrape. You'll also need to install Node.js.
You can pass a requests session into create_scraper() like so:

session = requests.Session()
session.headers = ...
scraper = cfscrape.create_scraper(sess=session)

Upvotes: 12

Related Questions