Reputation: 196
I keep getting this error when I try to print or use driver.source_page
.
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1494-1498: character maps to <undefined>
Not only with selenium, also with request and urllib3. I tried several solutions but none of them worked.
such as str()
,.encode("utf-8-sig"
OR "utf-8")
,
BeautifulSoup(source,from_encoding="utf-8")
.
my code:
import base64;from bs4 import BeautifulSoup;from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.example.com/")
source = driver.page_source
driver.close()
with open("test.html","wb") as W:
W.write(source)
soup = BeautifulSoup(source.encode("utf-8"),"html.parser")#.encode("utf-8")
print(soup.find_all("img"))
Any Idea about making it work?
Upvotes: 0
Views: 927
Reputation: 196
finally I solved it!
the problem here is because of Arabic language.
Simply ignore the Arabic text from the content by using:
.encode("utf-8","ignore")
but first you have to make soup as string so:
str(soup.find("span","captcha-container")).encode("utf-8","ignore")
Upvotes: 1