Raj
Raj

Reputation: 59

Selenium and BeautifulSoup can't fetch all HTML content

I'm scraping the bottom table labeled "Capacity : Operationally Available - Evening" on https://lngconnection.cheniere.com/#/ccpl

I am able to get all the HTML and everything shows up when I prettify() print the HTML but the parsers can't find it when I give a command to find the specific information I need.

Here's my script:

cc_driver = webdriver.Chrome('/Users/.../Desktop/chromedriver')
cc_driver.get('https://lngconnection.cheniere.com/#/ccpl')
cc_html = cc_driver.page_source

cc_content = soup(cc_html, 'html.parser')
cc_driver.close()
cc_table = cc_content.find('table', class_='k-selectable')
#print(cc_content.prettify())
print(cc_table.prettify())

now when I do the

print(cc_table.prettify())

The output is everything except the actual table data. Is there some error in my code or in their HTML that is hiding the actual table values? I'm able to see it when I print everything Selenium captures on the page. The HTML also doesn't have specific ID tags for any of the cell values.

Upvotes: 0

Views: 1465

Answers (2)

Ramakrishna Mamidi
Ramakrishna Mamidi

Reputation: 31

This should help you getting table html

from selenium import webdriver
from bs4 import BeautifulSoup as bs

cc_driver = webdriver.Chrome('../chromedriver_win32/chromedriver.exe')
cc_driver.get('https://lngconnection.cheniere.com/#/ccpl')
cc_html = cc_driver.page_source

cc_content = bs(cc_html, 'html.parser')
cc_driver.close()
cc_table = cc_content.find('table', attrs={'class':'k-selectable'})

#print(cc_content.prettify())
print(cc_table.prettify())

Upvotes: 0

MOHAMMED NUMAN
MOHAMMED NUMAN

Reputation: 183

You are looking into the HTML which is not yet complete. All the elements have not yet returned from the javascript. So you can do a webdriver wait.

from selenium import webdriver
from bs4 import BeautifulSoup as soup
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

cc_driver = webdriver.Chrome(r"path for driver")
cc_driver.get('https://lngconnection.cheniere.com/#/ccpl')
WebDriverWait(cc_driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 
'#capacityGrid > table > tbody')))
cc_html = cc_driver.page_source

cc_content = soup(cc_html, 'html.parser')
cc_driver.close()
cc_table = cc_content.find('table', class_='k-selectable')
#print(cc_content.prettify())
print(cc_table.prettify())

This will wait for the element to be present.

Upvotes: 1

Related Questions