Reputation: 21
I am trying to get table data from below code but surprisingly the script shows a "none" output for table, though I could clearly see it in my HTML doc.
Look forward for help..
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
site = 'http://www.altrankarlstad.com/wisp'
hdr = {'User-Agent': 'Chrome/78.0.3904.108'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read()
page = rawpage.replace("<!-->", "")
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"table workitems-table mt-2"})
print (table)
Also here comes the code with Selenium Script as suggested:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'http://www.altrankarlstad.com/wisp'
driver = webdriver.Chrome('C:\\Users\\rugupta\\AppData\\Roaming\\Microsoft\\Windows\\Start Menu\\Programs\\Python 3.7\\chromedriver.exe')
driver.get(url)
driver.find_element_by_id('root').click() #click on search button to fetch list of bus schedule
time.sleep(10) #depends on how long it will take to go to next page after button click
for i in range(1,50):
url = "http://www.altrankarlstad.com/wisp".format(pagenum = i)
text_field = driver.find_elements_by_xpath("//*[@id="root"]/div/div/div/div[2]/table")
for h3Tag in text_field:
print(h3Tag.text)
Upvotes: 0
Views: 226
Reputation: 629
The page wasn't fully loaded when you use Request. you can debug by printing res
.
It seems the page is using javascript to load the table.
You should use selenium, load the page with driver (eg: chromedriver, Firefoxdriver). Sleep a while until the page is loaded (you define it, it take quite a bit to load fully). Then get the table using selenium
import time
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'http://www.altrankarlstad.com/wisp'
driver = webdriver.Chrome('/path/to/chromedriver)
driver.get(url)
# I dont understand what's the purpose when clicking that button
time.sleep(100)
text_field = driver.find_elements_by_xpath('//*[@id="root"]/div/div/div/div[2]/table')
print (text_field[0].text)
You code worked fine with a bit of modifying, this will print all the text from the table. You should learn to debug and change it to get what you want.
This is my output running above scripts
Upvotes: 1