Reputation: 417
With ord(ch)
you can get a numerical code for character ch
up to 127
. Is there any function that returns a number from 0-255, so to cover also ISO 8859-1
characters?
Edit: Follows my last version of code and error I get
#!/usr/bin/python
# coding: iso-8859-1
import sys
reload(sys)
sys.setdefaultencoding('iso-8859-1')
print sys.getdefaultencoding() # prints "iso-8859-1"
def char_code(c):
return ord(c.encode('iso-8859-1'))
print char_code(u'à')
I get an error: TypeError: ord() expected a character, but string of length 2 found
Upvotes: 7
Views: 26967
Reputation: 308206
When you're starting with a Unicode string, you need to encode
rather than decode
.
>>> def char_code(c):
return ord(c.encode('iso-8859-1'))
>>> print char_code(u'à')
224
For ISO-8859-1 in particular, you don't even need to encode it at all, since Unicode uses the ISO-8859-1 characters for its first 256 code points.
>>> print ord(u'à')
224
Edit: I see the problem now. You've given a source code encoding comment that indicates the source is in ISO-8859-1. However, I'll bet that your editor is actually working in UTF-8. The source code will be mis-interpreted, and the single-character string you think you created will actually be two characters. Try the following to see:
print len(u'à')
If your encoding is correct, it will return 1
, but in your case it's probably 2
.
Upvotes: 2
Reputation: 189467
You can get ord()
for anything. As you might expect, ord(u'💩')
works fine, provided you can represent the character properly in your source, and/or read it in a known encoding.
Your error message vaguely suggests that coding: iso-8859-1
is not actually true, and the file's encoding is actually something else (UTF-8 or UTF-16 would be my guess).
The canonical must-read on character encoding in Python is http://nedbatchelder.com/text/unipain.html
Upvotes: 1
Reputation: 151
You can still use ord()
, but you have to decode it.
Like this:
def char_code(c):
return ord(c.decode('iso-8859-1'))
Upvotes: 0