Reputation: 111
Im using selenium to extract data from a web page. Im trying to write this data into a file, but i have some trouble doing so, when i write special char like 'é' it create unreadable char in my file(é). The website im getting the page from is encoded in iso-8859-1 and im using python 2.7.
browser = webdriver.Firefox()
browser.get(URL_SITE_ENCODED_IN_iso-8859-1)
html = browser.page_source.decode('iso-8859-1') //error
From what i understood i have to decode the page from iso-8859-1 then it will encode it in utf-8, but when i try to an error is raised : UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 246: ordinal not in range(128)
Upvotes: 1
Views: 5898
Reputation: 27714
It's probably because browser.page_source.decode
is already decoded Unicode. Check with:
>>> type(browser.page_source.decode)
<type 'unicode'>
When you write this to a file, you need to convert it to an appropriate encoding. In Python 2.x, use io module to create an automatic encoding file wrapper. Try:
browser = webdriver.Firefox()
browser.get(anysite)
with io.open("myoutfile.txt", "w", encoding="utf-8") as my_file:
my_file.write(browser.page_source)
Upvotes: 4