Duarte Guerreiro
Duarte Guerreiro

Reputation: 13

Not scraping website data with beautifulsoup

It is the 3rd or 4th time that I am using BeautifulSoup. I am using it alongside requests lib to scrape data from a sports' website. I am trying to scrape athletes' info such as name, age, height, etc. However, when trying to get the info(print(player_name)) I am getting this instead of what is displayed in the website page:

Name:{{details.player.person.lastName}}, {{details.player.person.firstName}}

Is there any way of accessing the real data?

My code :

import requests
from bs4 import BeautifulSoup

def scrape_player(player_url):

    response_player = requests.get(player_url)
    player_soup = BeautifulSoup(response_player.text, 'html.parser')
    div = player_soup.find('div', {'class' : 'player-info-row'})
    player_name = div.text
    print(player_name)
    


if __name__ == '__main__':
     scrape_player('https://ehfcl.eurohandball.com/men/20212/player/LFpFsiLDFvxs_tXnKlFAQw/luis-frade/')

Upvotes: 1

Views: 63

Answers (1)

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

Website loads data from script tags so its dynamic loaded and bs4 will not able to caputer via tags or class but although it is present in script tag

import requests
from bs4 import BeautifulSoup
url = "https://ehfcl.eurohandball.com/men/2021-22/player/Z8PG_QqFxhA-6PTQ4gcCSA/stas-skube/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

Here we can find script tag and load data into json format which returns data as key value pair and you can extract what so data you want!

data=soup.find("script",attrs={"type":"application/ld+json"})

import json
main_data=json.loads(data.string)

print(main_data['name'])
print(main_data['birthDate'])

Output:

Skube Stas
1989-11-15

Upvotes: 2

Related Questions