UTF Encoding in selenium webdriver

Question

I currently have the following:

from selenium import webdriver
d = webdriver.Chrome()
# request the url and get the page contents
title = result.find("span", {"class": "episode"}).find("a").text

However, the 'text' that is returned to me is:

# Note the truncation on the word "envol"
La grande envol

However, when I download the page source, it shows the following:

La grande envolée
    (1927)

Why is the text truncated in the webdriver response? How would I ensure it gives me the full utf-8 encoded text?

alecxe · Accepted Answer

As far as I understand, you are passing the page_source contents to BeautifulSoup for further parsing.

I would not do that since selenium itself can handle the parsing part pretty well. For example, you can use CSS selectors:

driver.find_element_by_css_selector('span.episode a').text

Example (using this IMDb page):

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('http://www.imdb.com/title/tt1844708/')
>>> print(driver.find_element_by_xpath('//span[@itemprop="name"]').text)
La grande envolée

UTF Encoding in selenium webdriver

Answers (1)

Related Questions