Reputation: 25
I have been trying to get data using Beautifulsoup on this website "http://www.jps.go.cr/" however when I get the data all the information between the tags is missing. I can confirm that the data is there inspecting the website however once I run the code it does not show.
here is the code:------------
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
url="http://www.jps.go.cr/productos/loteria-nacional"
req = Request(url,headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all("div",{"class":"detail_ultimoSorteo loteria"})
print(all)
I would appreciate any help since I am driving crazy, other websites work.
Thanks in advance.
Upvotes: 2
Views: 702
Reputation: 22440
You can get the required content in a slightly different manner. There is a library pyppeteer which can handle dynamic content very efficiently. Check out the following implementation:
import asyncio
from pyppeteer import launch
async def fetch_items():
wb = await launch()
page = await wb.newPage()
await page.goto("http://www.jps.go.cr/")
container = await page.querySelector('.detail_ultimoSorteo')
items = await page.evaluate('(element) => element.innerText', container)
print(items.strip())
asyncio.get_event_loop().run_until_complete(fetch_items())
Result:
Sorteo 4520
Domingo, 2 de Diciembre 2018
Primer premio
61 366 ₡ 120.000.000
Segundo premio
60 879 ₡ 18.000.000
Tercer premio
92 401 ₡ 8.000.000
Upvotes: 1
Reputation: 1891
This will work in your case just posting for you (I know PhantomJS has been deprecated you can use Chrome drivers).
from bs4 import BeautifulSoup
from selenium import webdriver
url="http://www.jps.go.cr/productos/loteria-nacional"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
all=soup.find_all("div",{"class":"detail_ultimoSorteo"})
print(all)
out put
Upvotes: 0
Reputation: 84465
The page is slow loading and needs a method such as selenium which allow enough time for your content to become available.
from selenium import webdriver
url = 'http://www.jps.go.cr/'
driver = webdriver.Chrome()
driver.get(url)
print(driver.find_element_by_css_selector('.detail_ultimoSorteo.loteria').text)
Output:
Upvotes: 1