Schaedel420
Schaedel420

Reputation: 175

Webscraping with Beautifulsoup and Python not working

I am trying to get a List of website addresses from the following page: https://www.wer-zu-wem.de/dienstleister/filmstudios.html

My code:

import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')
links = soup.find_all('a', {'class': 'col-md-4 col-lg-5 col-xl-4 text-center text-lg-right'})
print(links)

import requests
from bs4 import BeautifulSoup

webLinksList = []

result = requests.get(
    "https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')


website_Links = soup.find_all(
    'div', class_='col-md-4 col-lg-5 col-xl-4 text-center text-lg-right')


if website_Links != "":
    print("List is empty")
for website_Link in website_Links:
    try:
        realLink = website_Link.find(
            "a", attrs={"class": "btn btn-primary external-link"})
        webLinksList.append(featured_challenge.attrs['href'])
    except:
        continue

for link in webLinksList:
    print(link)

"list is empty" is printed at the beginning, and nothing I've tried adds any data to the list.

Upvotes: 0

Views: 65

Answers (2)

MITHU
MITHU

Reputation: 154

Try the following to get all the links leading to the external websites:

import requests
from bs4 import BeautifulSoup

link = "https://www.wer-zu-wem.de/dienstleister/filmstudios.html"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}

result = requests.get(link,headers=headers)
soup = BeautifulSoup(result.text,'lxml')
for links in soup.find_all('a',{'class':'external-link'}):
    print(links.get("href"))

Upvotes: 2

dimay
dimay

Reputation: 2804

Try this:

import requests
from bs4 import BeautifulSoup

headers = {
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0",
}

result = requests.get("https://www.wer-zu-wem.de/dienstleister/filmstudios.html", headers=headers)
src = result.content
soup = BeautifulSoup(src, 'lxml')
links = soup.find('ul', {'class': 'wzwListeFirmen'}).findAll("a")
print(links)

Upvotes: 2

Related Questions