Reputation: 481
I am trying to write into a file from a collection. The collection has special characters like ¡ which create a problem. For example the content in the collection has details like:
{..., Name: ¡Hi!, ...}
Now I am trying to write the same into a file but I get the error
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)
I have tried the using the solutions provided here but in vain. It will be great if someone could help me with this :)
So the example goes like this:
I have a collection which has the following details
{ "_id":ObjectId("5428ead854fed46f5ec4a0c9"),
"author":null,
"class":"culture",
"created":1411967707.356593,
"description":null,
"id":"eba9b4e2-900f-4707-b57d-aa659cbd0ac9",
"name":"¡Hola!",
"reviews":[
],
"screenshot_urls":[
]
}
Now I try to access the name
entry here from the collection and I do that by iterating it over the collection i.e.
f = open("sample.txt","w");
for val in exampleCollection:
f.write("%s"%str(exampleCollection[val]).encode("utf-8"))
f.close();
Upvotes: 1
Views: 3116
Reputation: 16279
This will remove all the characters in the string which are not valid ASCII.
>>> '¡Hola!'.encode('ascii', 'ignore').decode('ascii')
'Hola!'
Alternatively, you can write the file as UTF-8, which can represent nearly all characters on Earth.
Upvotes: 1
Reputation: 19243
You're trying to convert unicode to ascii in "strict" mode:
>>> help(str.encode)
Help on method_descriptor:
encode(...)
S.encode([encoding[,errors]]) -> object
Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that is able to handle UnicodeEncodeErrors.
You probably want something like one of the following:
s = u'¡Hi there!'
print s.encode('ascii', 'ignore') # removes the ¡
print s.encode('ascii', 'replace') # replaces with ?
print s.encode('ascii','xmlcharrefreplace') # turn into xml entities
print s.encode('ascii', 'strict') # throw UnicodeEncodeErrors
Upvotes: 0
Reputation: 11073
The easiest way to remove characters you don't want is to specify the characters you do.
>>> import string
>>> validchars = string.ascii_letters + string.digits + ' '
>>> s = '¡Hi there!'
>>> clean = ''.join(c for c in s if c in validchars)
>>> clean
'Hi there'
If some forms of punctuation are okay, add them to validchars.
Upvotes: 2
Reputation: 2099
As one user posted on this page, you should take a look at the Unicode tutorial in the docs: https://docs.python.org/2/howto/unicode.html
What's happening is you're trying to use a character that's outside the ASCII range, which is a mere 128 symbols. There's a really great article on this I found a while back, which I'll try to find and post here.
Edit: ah, here it is: http://www.joelonsoftware.com/articles/Unicode.html
Upvotes: 0