Jawad
Jawad

Reputation: 196

Python Selenuim - UnicodeEncodeError 'charmap' codec can't encode

I keep getting this error when I try to print or use driver.source_page.

UnicodeEncodeError: 'charmap' codec can't encode characters in position 1494-1498: character maps to <undefined>

Not only with selenium, also with request and urllib3. I tried several solutions but none of them worked. such as str(),.encode("utf-8-sig" OR "utf-8"), BeautifulSoup(source,from_encoding="utf-8").

my code:

import base64;from bs4 import BeautifulSoup;from selenium import webdriver

driver = webdriver.Firefox()
driver.get("https://www.example.com/")
source = driver.page_source
driver.close()
with open("test.html","wb") as W:
    W.write(source)

soup = BeautifulSoup(source.encode("utf-8"),"html.parser")#.encode("utf-8")
print(soup.find_all("img"))

Any Idea about making it work?

Upvotes: 0

Views: 927

Answers (1)

Jawad
Jawad

Reputation: 196

finally I solved it! the problem here is because of Arabic language. Simply ignore the Arabic text from the content by using: .encode("utf-8","ignore")

but first you have to make soup as string so:

str(soup.find("span","captcha-container")).encode("utf-8","ignore")

Upvotes: 1

Related Questions