Gregory Spanner
Gregory Spanner

Reputation: 29

Why can BeautifulSoup not find the HTML class?

I'm trying to scrape this website using requests and BeautifulSoup in python:

I want to get all the information within the article tag with class = "ficha-jogo". When I run the code below, x is an empty list.

url = "https://globoesporte.globo.com/rs/futebol/brasileirao-serie-a/jogo/25-05-2019/gremio-atletico-mg.ghtml"
r = requests.get(url)     
soup = BeautifulSoup(r.content, "lxml")
x = soup.select(".ficha-jogo")
print(x)

I was expecting it to return all tags contained within the article tag with class = "ficha-jogo".

Upvotes: 1

Views: 97

Answers (2)

Dorian Massoulier
Dorian Massoulier

Reputation: 564

You can also do it with requests_html:

from requests_html import HTMLSession

session = HTMLSession()

url = "https://globoesporte.globo.com/rs/futebol/brasileirao-serie-a/jogo/25-05-2019/gremio-atletico-mg.ghtml"

r = session.get(url)
r.html.render()

article = r.html.find('.ficha-jogo', first=True).text
print(article)

Upvotes: 0

bharatk
bharatk

Reputation: 4315

This website link is dynamic rendering request article data. You should try automation selenium library. it allows you to scrape dynamic rendering request(js or ajax) page data.

from bs4 import BeautifulSoup
from selenium import webdriver

browser = webdriver.Chrome()
url = "https://globoesporte.globo.com/rs/futebol/brasileirao-serie-a/jogo/25-05-2019/gremio-atletico-mg.ghtml"

browser.get(url)
soup = BeautifulSoup(browser.page_source, 'html.parser')

article = soup.find("article",{"class":"ficha-jogo"})
print(article.text)

O/P:

GREPaulo Victor 1GOLLeonardo 6LADPedro Geromel 3ZADRodrigues 38ZAEJuninho Capixaba 29LAEMichel  5VOLMaicon 8VOLJean Pyerre 21MECThaciano 16MECEverton 11ATAAlisson 23ATADiego Tardelli 9ATAAndré 90ATAFelipe Vizeu 10ATACAMVictor 1GOLPatric 2LADLeonardo Silva 3ZADIgor Rabello 16ZAEFábio Santos 6LAEJosé Welison 14VOLNathan 23MECJair 88VOLCazares 10MECGeuvânio 49ATALuan 27MECBruninho 43MECRicardo Oliveira 9ATAChará 8ATARenato GaúchoTécnico4 - 3 - 3Esquema TáticoRodrigo SantanaTécnico4 - 4 - 2Esquema TáticoMostrar ficha completaReservasJúlio César 22GOLLéo Moura 2LADRafael Galhardo 42LADRomulo 13VOLDarlan 37VOLMontoya 20MECVico 15ATAPepê 25ATACleiton 40GOLIago Maidana 19ZADHulk 22LAEAdilson 21VOLVinícius 29MECTerans 20MECAlerrandro 44ATAMaicon 11ATAInformações sobre o jogoArena do GrêmioArena Desportiva

Download selenium web driver for chrome browser:

http://chromedriver.chromium.org/downloads

Install web driver for chrome browser:

https://christopher.su/2015/selenium-chromedriver-ubuntu/

Selenium tutorial:

https://selenium-python.readthedocs.io/

Upvotes: 1

Related Questions