Reputation: 207
I'm trying to read a CSV file that contains information in simplified Chinese, and encode it into a request to put into the database.
Section of my code:
#coding:utf-8
import csv, sys, urllib, urllib2
with open('testdata1.csv', 'rU') as f:
reader = csv.reader(f)
try:
z = csv.reader(f, delimiter='\t')
for row in reader:
print row[0]
if row[0] in (None, ""):
continue
elif row[0] == '家长姓':
print row[0]
However I'm encountering two problems:
1) Sublime Text cannot understand Chinese characters, aka it does not understand to look for '家长姓' in the command elif row[0] == '家长姓'
.
2) Sublime Text doesn't seem to be able to print Chinese characters (when I tell it to print some of the information, all Chinese characters are replaced by underscores).
I've already tried File>Save with Encoding>UTF-8 to no avail. Any help would be appreciated.
Upvotes: 2
Views: 7292
Reputation: 1987
'家长姓'
in your code is a <type 'str'>
,and the content you read from is also a <type 'str'>
,but maybe their encoding methods are not the same.You can decode them to be <type 'unicode'>
before the compare.
For example:
row[0].decode('utf-8') == u'家长姓'
And here is a test about str and unicode:
test = '你好'
test1 = u'你好'
print type(test)
print type(test1)
print test == test1
print type(test.decode('utf-8'))
print test.decode('utf-8') == test1
output:
<type 'str'>
<type 'unicode'>
False
<type 'unicode'>
True
Upvotes: 1
Reputation: 148890
Non ASCII characters are always hard to use because there are 3 different problems:
# -*- coding: ... -*-
in first or second line)sys.encoding
that will be used for rendering themFirst, you coding line forgot the -*-
, meaning that some editors could fail to correctly process the encoding.
You could also try whether IDLE editor processes more easily the chinese characters.
But anyway, if every else fails, you can always use explicit unicode codes:
>>> txt = u'家长姓' # only works if editor and interpretor were correctly declared the source encoding
>>> txt2 = u'\xe5\xae\xb6\xe9\x95\xbf\xe5\xa7\x93' # works on any system
>>> txt == txt2
True
TL/DR: if you have problem to use non ASCII characters in Python source, use their escaped code
Upvotes: 1
Reputation: 12087
Try to open file using codecs
with the appropriate encoding:
>>> import codecs
>>> f = codecs.open("testdata1.csv", "r", "utf-8")
Upvotes: 1