flegmate
flegmate

Reputation: 1

Beautifulsoup doesn't return the whole html seen in inspect

I'm trying to parse the html of a live sport results website, but my code doesn't return every span tag there is to the site. I saw under inspect that all the matches are , but my code can't seem to find anything from the website apart from the footer or header. Also tried with the divs, those didn't work either. I'm new to this and kinda lost, this is my code, could someone help me? I left the firs part of the for loop for more clarity.

#Creating the urls for the different dates
my_url='https://www.livescore.com/en/football/{}'.format(d1)
print(my_url)
today=date.today()-timedelta(days=i)
d1 = today.strftime("%Y-%m-%d/")

#Opening up the connection and grabbing the html
uClient=uReq(my_url)
page_html=uClient.read()
uClient.close()
#HTML parser
page_soup=soup(page_html,"html.parser")
spans=page_soup.findAll("span")
matches=page_soup.findAll("div", {"class":"LiveRow-w0tngo-0 styled__Root-sc-2sc0sh-0 styled__FootballRoot-sc-2sc0sh-4 eAwOMF"})
print(spans)

Upvotes: 0

Views: 1426

Answers (2)

chitown88
chitown88

Reputation: 28565

The page is dynamic and rendered by JS. When you do a request, you are getting the static html response before it's rendered. There are few things you could do to work with this situation:

  1. Use something like Selenium which simulates the browser operations. It'l open a browser, go to the site, allow the site to render the page. Once the page is rendered, you THEN can get the html of that page which will have the data. It'll work, but takes longer to process since it literally is simulating the process as you would do it manually.
  2. Use requests-HTML package which also allows the page to be rendered (I have not tried this package before as it conflicts with my IDE Spyder). This would be similar to Selenium, without the borwser actually opening. It's essentially the requests package, but with javascript support.
  3. See if the data (in the static html response) is embedded in the <script> tags in json format. Sometimes you'll find it there, but takes a little work to pull that out and conform/manipulate to a valid json format to be read in using json.loads()
  4. Find if there is an api of some sort (checking XHR) and fetch the data directly from there.

The best option is always #4 if it's available. Why? Because the data will be consistently structured. Even if the website changes it's structure or css changes (which would change the html you parse), the underlying data feeding into it will rarely change it's structure. This site does have an api to access the data:

import requests
import datetime

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'}

dates_list = ['20210214', '20210215', '20210216']

for dateStr in dates_list:
    url = f'https://prod-public-api.livescore.com/v1/api/react/date/soccer/{dateStr}/0.00'
    dateStr_alpha = datetime.datetime.strptime(dateStr, '%Y%m%d').strftime('%B %d')
    response = requests.get(url, headers=headers).json()
    stages = response['Stages']
    for stage in stages:
        location = stage['Cnm']
        stageName = stage['Snm']
        events = stage['Events']
        print('\n\n%s - %s\t%s' %(location, stageName, dateStr_alpha))
        print('*'*50)
        for event in events:
            outcome = event['Eps']
            team1Name = event['T1'][0]['Nm']
            if 'Tr1' in event.keys():
                team1Goals = event['Tr1']
            else:
                team1Goals = '?'
            
            team2Name = event['T2'][0]['Nm']
            if 'Tr2' in event.keys():
                team2Goals = event['Tr2']
            else:
                team2Goals = '?'
            print('%s\t%s %s - %s %s' %(outcome, team1Name, team1Goals, team2Name, team2Goals))

Output:

England - Premier League        February 15
********************************************************************************
FT      West Ham United 3 - Sheffield United 0
FT      Chelsea 2 - Newcastle United 0


Spain - LaLiga Santander        February 15
********************************************************************************
FT      Cadiz 0 - Athletic Bilbao 4


Germany - Bundesliga    February 15
********************************************************************************
FT      Bayern Munich 3 - Arminia Bielefeld 3


Italy - Serie A February 15
********************************************************************************
FT      Hellas Verona 2 - Parma Calcio 1913 1


Portugal - Primeira Liga        February 15
********************************************************************************
FT      Sporting CP 2 - Pacos de Ferreira 0


Belgium - Jupiler League        February 15
********************************************************************************
FT      Gent 4 - Royal Excel Mouscron 0


Belgium - First Division B      February 15
********************************************************************************
FT      Westerlo 1 - Lommel 1


Turkey - Super Lig      February 15
********************************************************************************
FT      Genclerbirligi 0 - Besiktas 3
FT      Antalyaspor 1 - Yeni Malatyaspor 1


Brazil - Serie A        February 15
********************************************************************************
FT      Gremio 1 - Sao Paulo 2
FT      Ceara 1 - Fluminense 3
FT      Sport Recife 0 - Bragantino 0


Italy - Serie B February 15
********************************************************************************
FT      Cosenza 2 - Reggina 2


France - Ligue 2        February 15
********************************************************************************
FT      Sochaux 2 - Valenciennes 0
FT      Toulouse 3 - AC Ajaccio 0


Spain - LaLiga Smartbank        February 15
********************************************************************************
FT      Castellon 1 - Fuenlabrada 2
FT      Real Oviedo 3 - Lugo 1


...


Uganda - Super League   February 16
********************************************************************************
FT      Busoga United FC 1 - Bright Stars FC 1
FT      Kitara FC 0 - Mbarara City 1
FT      Kyetume 2 - Vipers SC 2
FT      UPDF FC 0 - Onduparaka FC 1
FT      Uganda Police 2 - BUL FC 0


Uruguay - Primera División: Clausura    February 16
********************************************************************************
FT      Boston River 0 - Montevideo City Torque 3


International - Friendlies  Women       February 16
********************************************************************************
FT      Guatemala 3 - Panama 1


Africa - Africa Cup Of Nations U20: Group C     February 16
********************************************************************************
FT      Ghana U20 4 - Tanzania U20 0
FT      Gambia U20 0 - Morocco U20 1


Brazil - Amazonense: Group A    February 16
********************************************************************************
Postp.  Manaus FC ? - Penarol AC AM ?

Upvotes: 2

Insula
Insula

Reputation: 953

Now assuming you have the correct class to scrape, a simple loop would work:

for i in soup.find_all("div", {"class":"LiveRow-w0tngo-0 styled__Root-sc-2sc0sh-0 styled__FootballRoot-sc-2sc0sh-4 eAwOMF"}):
    print(i)

Or add it into a list:

teams = []

for i in soup.find_all("div", {"class":"LiveRow-w0tngo-0 styled__Root-sc-2sc0sh-0 styled__FootballRoot-sc-2sc0sh-4 eAwOMF"}):
        teams.append(i.text)
print(teams)

If this does not work, run some tests to see if you are actually scraping the correct things e.g. print a singular thing.

Also in your code I see that you are printing "spans" and not "matches", this could also be a problem with your code.

You can also look at this post what further explains how to do this.

Upvotes: 0

Related Questions