Zygote
Zygote

Reputation: 21

My python script does not print table from html

I am trying to get table data from below code but surprisingly the script shows a "none" output for table, though I could clearly see it in my HTML doc. Look forward for help..The below image shows the "inspect view of the document"]

from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
site = 'http://www.altrankarlstad.com/wisp'
hdr = {'User-Agent': 'Chrome/78.0.3904.108'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read()
page = rawpage.replace("<!-->", "")
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"table workitems-table mt-2"})
print (table)

Also here comes the code with Selenium Script as suggested:

import time
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://www.altrankarlstad.com/wisp'

driver = webdriver.Chrome('C:\\Users\\rugupta\\AppData\\Roaming\\Microsoft\\Windows\\Start Menu\\Programs\\Python 3.7\\chromedriver.exe') 

driver.get(url)
driver.find_element_by_id('root').click() #click on search button to fetch list of bus schedule

time.sleep(10) #depends on how long it will take to go to next page after button click

for i in range(1,50):
    url = "http://www.altrankarlstad.com/wisp".format(pagenum = i)

text_field = driver.find_elements_by_xpath("//*[@id="root"]/div/div/div/div[2]/table")
for h3Tag in text_field:
    print(h3Tag.text)

Upvotes: 0

Views: 226

Answers (1)

H&#249;ng Nguyễn
H&#249;ng Nguyễn

Reputation: 629

The page wasn't fully loaded when you use Request. you can debug by printing res. It seems the page is using javascript to load the table.

You should use selenium, load the page with driver (eg: chromedriver, Firefoxdriver). Sleep a while until the page is loaded (you define it, it take quite a bit to load fully). Then get the table using selenium

import time
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://www.altrankarlstad.com/wisp'

driver = webdriver.Chrome('/path/to/chromedriver) 

driver.get(url)
# I dont understand what's the purpose when clicking that button
time.sleep(100) 

text_field = driver.find_elements_by_xpath('//*[@id="root"]/div/div/div/div[2]/table')
print (text_field[0].text)

You code worked fine with a bit of modifying, this will print all the text from the table. You should learn to debug and change it to get what you want.

This is my output running above scripts

This is my output running above scripts

Upvotes: 1

Related Questions