carlos saborio
carlos saborio

Reputation: 25

python beautifulsoup no data retreived between div tags

I have been trying to get data using Beautifulsoup on this website "http://www.jps.go.cr/" however when I get the data all the information between the tags is missing. I can confirm that the data is there inspecting the website however once I run the code it does not show.

here is the code:------------

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
url="http://www.jps.go.cr/productos/loteria-nacional"
req = Request(url,headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all("div",{"class":"detail_ultimoSorteo loteria"})
print(all)

I would appreciate any help since I am driving crazy, other websites work.

Thanks in advance.

Upvotes: 2

Views: 702

Answers (3)

SIM
SIM

Reputation: 22440

You can get the required content in a slightly different manner. There is a library pyppeteer which can handle dynamic content very efficiently. Check out the following implementation:

import asyncio
from pyppeteer import launch

async def fetch_items():
    wb = await launch()
    page = await wb.newPage()
    await page.goto("http://www.jps.go.cr/")

    container = await page.querySelector('.detail_ultimoSorteo')
    items = await page.evaluate('(element) => element.innerText', container)
    print(items.strip())

asyncio.get_event_loop().run_until_complete(fetch_items())

Result:

Sorteo 4520
Domingo, 2 de Diciembre 2018
Primer premio

61 366 ₡ 120.000.000
Segundo premio

60 879 ₡ 18.000.000
Tercer premio

92 401 ₡ 8.000.000

Upvotes: 1

Ashfaque Ali Solangi
Ashfaque Ali Solangi

Reputation: 1891

This will work in your case just posting for you (I know PhantomJS has been deprecated you can use Chrome drivers).

from bs4 import BeautifulSoup
from selenium import webdriver

url="http://www.jps.go.cr/productos/loteria-nacional"

browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source

soup = BeautifulSoup(html, 'html.parser')

all=soup.find_all("div",{"class":"detail_ultimoSorteo"})
print(all)

out put

enter image description here

Upvotes: 0

QHarr
QHarr

Reputation: 84465

The page is slow loading and needs a method such as selenium which allow enough time for your content to become available.

from selenium import webdriver
url = 'http://www.jps.go.cr/'
driver = webdriver.Chrome()
driver.get(url)
print(driver.find_element_by_css_selector('.detail_ultimoSorteo.loteria').text)

Output:

Upvotes: 1

Related Questions