python webscraping results in block

Question

I want to webscrape german real estate website immobilienscout24.de. It is not intended for commercial use or publication and I do not intend on spamming the site, it is merely for coding practice. I would like to write a python tool that automatically downloads the HTML of given immobilienscout24.de sites. I have tried to use beautifulsoup for this, however, the parsed HTML doesn't show the content but asks if I am a robot etc., meaning my webscraper got detected and blocked (I can access the site in Firefox just fine). I have set a referer, a delay and a random user agent. What else can I do to avoid being detected (i.e. rotating proxies, random clicks, headless chrome, this script, other webscraping tools that don't get detected...)? Things I found online that might be the reason for the block:

missing/wrong cookies
missing javascript support/other javascript issues
no mouse clicks/movements
incomplete/unrealistic header
SSL/TLS fingerprints and other things to exotic for me to understand

If someone has a working solution with which one can scrape the site, say, 10 times without being blocked, I would be very thankful. Here is my code so far:

from bs4 import BeautifulSoup
import numpy
import time
from fake_useragent import UserAgent

def get_html(url, headers): #scrapes and parses the HTML of a given URL while using custom header
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

ua = UserAgent()
headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "de,de-DE;q=0.8,en;q=0.6", 
    "Dnt": "1", 
    "Host": "https://www.immobilienscout24.de/", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": ua.random, 
  }
delays = [3, 5, 7, 4, 4, 11]
time.sleep(numpy.random.choice(delays))
test = get_html("https://www.immobilienscout24.de/Suche/de/baden-wuerttemberg/heidelberg/wohnung-kaufen?enteredFrom=one_step_search", headers)```

python webscraping results in block

Answers (1)

Related Questions