Reputation: 110267
I currently have the following:
from selenium import webdriver
d = webdriver.Chrome()
# request the url and get the page contents
title = result.find("span", {"class": "episode"}).find("a").text
However, the 'text' that is returned to me is:
# Note the truncation on the word "envol"
<td class="title"><a href="/title/tt1844708/">La grande envol</a></td>
However, when I download the page source, it shows the following:
<td class="title"><a href="/title/tt1844708/">La grande envolée</a>
<span class="year_type">(1927)</span><br />
</td>
Why is the text truncated in the webdriver response? How would I ensure it gives me the full utf-8 encoded text?
Upvotes: 2
Views: 4073
Reputation: 473883
As far as I understand, you are passing the page_source
contents to BeautifulSoup
for further parsing.
I would not do that since selenium
itself can handle the parsing part pretty well. For example, you can use CSS selectors:
driver.find_element_by_css_selector('span.episode a').text
Example (using this IMDb
page):
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('http://www.imdb.com/title/tt1844708/')
>>> print(driver.find_element_by_xpath('//span[@itemprop="name"]').text)
La grande envolée
Upvotes: 1