ldragicevic
ldragicevic

Reputation: 711

Python Requests - "To continue your browser has to accept cookies and has to have JavaScript enabled."

I would like to scrape some ads for personal use from mobile.de.

I am using python 3.6 with requests lib, but I am facing issue with some bot inspection. How could I pass this gateway from their website?

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.mobile.de/?lang=en")
bs = BeautifulSoup(r.content, 'lxml')
print(bs)

This part of code displays me following:

<p>To continue your browser has to accept cookies and has to have JavaScript enabled.</p>

Where can I find the logic that I need to solve in order to pass this?

Upvotes: 11

Views: 34724

Answers (2)

KC.
KC.

Reputation: 3107

The reason you got unexpected content is you do not have a valid header. Just like @afit said. But To continue your browser has to accept cookies and has to have JavaScript enabled. is making sense, because if you do not enable JavaScript you won't load full of content.

Note: I recommend you use selenium to do this. requests_html can't access website successfully dues to lack of suitable header while it is rendering. Btw, if you want to access the url inside JavaScript and grab content, it will be tough job.

from bs4 import BeautifulSoup
from selenium import webdriver

dr = webdriver.Chrome()
dr.get("https://www.mobile.de/?lang=en")
bs = BeautifulSoup(dr.page_source,"lxml")

Upvotes: 8

Aidan Fitzpatrick
Aidan Fitzpatrick

Reputation: 2035

They could be doing this a number of different ways, ranging from trivial to tricky to bypass at scale. One approach would be to modify your User-Agent, as their simplest approach would be to deny requests based on that.

r = requests.get(
    'https://yoursite.com',
    headers = {
        'User-Agent': 'Popular browser\'s user-agent',
    }
)

It doesn't look like it from the example URL you show, but they could be expecting that URL to be hit after hitting another page on the site that drops a cookie. If that's the case, make the earlier request and provide the cookie in your requests call.

Upvotes: 5

Related Questions