Reputation: 13
It is the 3rd or 4th time that I am using BeautifulSoup. I am using it alongside requests lib to scrape data from a sports' website. I am trying to scrape athletes' info such as name, age, height, etc. However, when trying to get the info(print(player_name)) I am getting this instead of what is displayed in the website page:
Name:{{details.player.person.lastName}}, {{details.player.person.firstName}}
Is there any way of accessing the real data?
My code :
import requests
from bs4 import BeautifulSoup
def scrape_player(player_url):
response_player = requests.get(player_url)
player_soup = BeautifulSoup(response_player.text, 'html.parser')
div = player_soup.find('div', {'class' : 'player-info-row'})
player_name = div.text
print(player_name)
if __name__ == '__main__':
scrape_player('https://ehfcl.eurohandball.com/men/20212/player/LFpFsiLDFvxs_tXnKlFAQw/luis-frade/')
Upvotes: 1
Views: 63
Reputation: 3400
Website loads data from script tags so its dynamic loaded and bs4
will not able to caputer via tags or class but although it is present in script
tag
import requests
from bs4 import BeautifulSoup
url = "https://ehfcl.eurohandball.com/men/2021-22/player/Z8PG_QqFxhA-6PTQ4gcCSA/stas-skube/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
Here we can find script
tag and load data into json format which returns data as key value pair and you can extract what so data you want!
data=soup.find("script",attrs={"type":"application/ld+json"})
import json
main_data=json.loads(data.string)
print(main_data['name'])
print(main_data['birthDate'])
Output:
Skube Stas
1989-11-15
Upvotes: 2