Reputation: 23
I'm having an issue when asserting two non-ascii values. One is coming from a csv file and the other one obtained from an element in the html:
<h1 class="LoginElement">登录</h1>
I'm using selenium to get the text
w_msg = driver.find_element(By.CSS_SELECTOR, "h1.LoginElement").text
When I assert both values
assert txt in w_msg
I get the following error msg:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 0: ordinal not in range(128)
if I print both variables and their types:
print txt
print type(txt)
print w_msg
print type(w_msg)
It returns the following:
登入
<type 'str'>
登录
<type 'unicode'>
This is how I'm initializing the CSV file from my "Utility" class:
def open_csv(base_csv, file_name):
csv_file = open(base_csv + file_name, 'rb')
reader = csv.reader(csv_file, delimiter=',')
row = list(reader)
return row
And here's the call from the test:
csv = Utility.open_csv(base_csv, file_name)
NOTE: I'm using OpenOffice Calc to build the csv and saving it in UTF-8
I've tried lots of solutions found in SO but still can't get it to work. Any help or lead in the right direction will be much appreciated.
Upvotes: 2
Views: 894
Reputation: 27714
Python is trying to convert your str
to a Unicode to carry out the comparison. Unfortunately, Python 2.x is designed to err on the side of caution and only decode your string using ASCII.
You need to decode txt
to a Unicode using the appropriate encoding of the CSV file so Python doesn't have to.
You could do this with txt.decode()
, but the best way to do it by having Python decode it for you as you read the file.
Unfortunately, the Python 2.x CSV module doesn't support Unicode so you need to use the drop in replacement: https://github.com/jdunck/python-unicodecsv
Use it like:
import unicodecsv
with open("myfile.csv") as my_csv:
r = unicodecsv.reader(my_csv, encoding=YOURENCODING)
YOURENCODING
may be utf-8
, cp1252
or any codec listed here: https://docs.python.org/2/library/codecs.html#standard-encodings
If the CSV has come from Excel then it's likely to be a codec beginning with cp
Upvotes: 2