Shashi Shankar Singh
Shashi Shankar Singh

Reputation: 185

Unable to get the expected html element details using Python

I am trying to scrape a website using Python. I have been able to scrape it successfully, however the expected resulted is not fetching up. I think there is something to do with the JavaScript of the web page.

My Code below:

driver.get(
        "https://my website")

soup=BeautifulSoup(driver.page_source,'lxml')
all_text = soup.text
ct = all_text.replace('\n', ' ')
cl_text = ct.replace('\t', ' ')
cln_text_t = cl_text.replace('\r', ' ')
cln_text = re.sub(' +', ' ', cln_text_t)
print(cln_text)

Instead of giving me the website details it is giving the below data. Any idea how could I fix this?

html, body {height:100%;margin:0;} You have to enable javascript in your browser to use an application built with Vaadin.........

Upvotes: 0

Views: 121

Answers (1)

Dmitri T
Dmitri T

Reputation: 168157

Why do you need this BeautifulSoup at all? It doesn't seem to support JavaScript.

If you need to get web page text you can fetch the document root using simple XPath selector of //html and get innerText property of the resulting WebElement

Suggested code change:

driver.get(
        "my website")

root = driver.find_element_by_xpath("//html")

all_text = root.get_attribute("innerText")

Upvotes: 1

Related Questions