Reputation: 417

Encoding characters with ISO 8859-1 in Python

With ord(ch) you can get a numerical code for character ch up to 127. Is there any function that returns a number from 0-255, so to cover also ISO 8859-1 characters?
Edit: Follows my last version of code and error I get

#!/usr/bin/python
# coding: iso-8859-1

import sys
reload(sys)
sys.setdefaultencoding('iso-8859-1')
print sys.getdefaultencoding()  # prints "iso-8859-1" 

def char_code(c):
    return ord(c.encode('iso-8859-1'))
print char_code(u'à')

I get an error: TypeError: ord() expected a character, but string of length 2 found

Upvotes: 7

Answers (3)

Mark Ransom

Reputation: 308206

When you're starting with a Unicode string, you need to encode rather than decode.

>>> def char_code(c):
        return ord(c.encode('iso-8859-1'))

>>> print char_code(u'à')
224

For ISO-8859-1 in particular, you don't even need to encode it at all, since Unicode uses the ISO-8859-1 characters for its first 256 code points.

>>> print ord(u'à')
224

Edit: I see the problem now. You've given a source code encoding comment that indicates the source is in ISO-8859-1. However, I'll bet that your editor is actually working in UTF-8. The source code will be mis-interpreted, and the single-character string you think you created will actually be two characters. Try the following to see:

print len(u'à')

If your encoding is correct, it will return 1, but in your case it's probably 2.

Upvotes: 2

tripleee

Reputation: 189467

You can get ord() for anything. As you might expect, ord(u'💩') works fine, provided you can represent the character properly in your source, and/or read it in a known encoding.

Your error message vaguely suggests that coding: iso-8859-1 is not actually true, and the file's encoding is actually something else (UTF-8 or UTF-16 would be my guess).

The canonical must-read on character encoding in Python is http://nedbatchelder.com/text/unipain.html

Upvotes: 1

Rafael Telles

Reputation: 151

You can still use ord(), but you have to decode it.

Like this:

def char_code(c):
    return ord(c.decode('iso-8859-1'))

Upvotes: 0

Encoding characters with ISO 8859-1 in Python

Answers (3)

Related Questions