Selenium webdriver and URIError: "String contained an illegal UTF-16 sequence"

Question

Background: I just learned how to use "Webdriver" and "Beautifulsoup" for two days.

Problem: I use the following code to download a webpage:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.PhantomJS(executable_path)
driver.get('https://mojim.com/twy100468x17x18.htm')
pageSource = driver.page_source
...

then, I encountered this error

WebDriverException: Message: URIError - String contained an illegal UTF-16 sequence.

Try: I try to replace pageSource = browser.page_source with
(driver.page_source).encode('ascii', 'ignore')
(driver.page_source).encode('utf-8') (suggested by here)
but still end in with the same error....

Page Source here

What should I do? Is there an illegal text in the html or what?
Thank you

Alexey Trofimov · Accepted Answer

Ive just overcome this situation. This is caused by different non UTF chars

I solved this surprisingly with Edge driver (Chrome and Mozilla doesnt handle that). So you can use it:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Edge()
driver.get('https://mojim.com/twy100468x17x18.htm')
pageSource = driver.page_source

The thing is that Edge is not headless like PhantomJS so when scraping i use it only on this bad excepted links. Also Egde is almost as fast as PhantomJS.

Selenium webdriver and URIError: "String contained an illegal UTF-16 sequence"

Answers (1)

Related Questions

Selenium webdriver and URIError: &quot;String contained an illegal UTF-16 sequence&quot;

Answers (1)

Related Questions

Selenium webdriver and URIError: "String contained an illegal UTF-16 sequence"