David542
David542

Reputation: 110267

UTF Encoding in selenium webdriver

I currently have the following:

from selenium import webdriver
d = webdriver.Chrome()
# request the url and get the page contents
title = result.find("span", {"class": "episode"}).find("a").text

However, the 'text' that is returned to me is:

# Note the truncation on the word "envol"
<td class="title"><a href="/title/tt1844708/">La grande envol</a></td>

However, when I download the page source, it shows the following:

<td class="title"><a href="/title/tt1844708/">La grande envolée</a>
    <span class="year_type">(1927)</span><br />
</td>

Why is the text truncated in the webdriver response? How would I ensure it gives me the full utf-8 encoded text?

Upvotes: 2

Views: 4073

Answers (1)

alecxe
alecxe

Reputation: 473883

As far as I understand, you are passing the page_source contents to BeautifulSoup for further parsing.

I would not do that since selenium itself can handle the parsing part pretty well. For example, you can use CSS selectors:

driver.find_element_by_css_selector('span.episode a').text

Example (using this IMDb page):

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('http://www.imdb.com/title/tt1844708/')
>>> print(driver.find_element_by_xpath('//span[@itemprop="name"]').text)
La grande envolée

Upvotes: 1

Related Questions