Reputation: 3199
I am trying to read some French text and do some frequency analysis of words. I want the characters with the umlauts and other diacritics to stay. So, I did this for testing:
>>> import codecs
>>> f = codecs.open('file','r','utf-8')
>>> for line in f:
... print line
...
Faites savoir à votre famille que vous êtes en sécurité.
So far, so good. But, I have a list of French files which I iterate over in the following way:
import codecs,sys,os
path = sys.argv[1]
for f in os.listdir(path):
french = codecs.open(os.path.join(path,f),'r','utf-8')
for line in french:
print line
Here, it gives the following error:
rdholaki74: python TestingCodecs.py ../frenchResources | more
Traceback (most recent call last):
File "TestingCodecs.py", line 7, in <module>
print line
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 14: ordinal not in range(128)
Why is it that the same file throws up an error when passed as an argument and not when given explicitly in the code?
Thanks.
Upvotes: 2
Views: 2065
Reputation: 414745
It is a print error due to redirection. You could use:
PYTHONIOENCODING=utf-8 python ... | ...
Specify another encoding if your terminal doesn't use utf-8
Upvotes: 2
Reputation: 799230
Because you're misinterpreting the cause. The fact that you're piping the output means that Python can't detect what encoding to use. If stdout is not a TTY then you'll need to encode as UTF-8 manually before outputting.
Upvotes: 2