prostock
prostock

Reputation: 9545

upload file with unicode filename

I have a file with a unicode name(eg chinese characters). I get a UnicodeEncodeError. I'm using postgres database with utf8 and the django development server on ubuntu lucid 64. What am I missing? I do the following where filename is the unicode name of the file in models.py:

def get_upload_path(instance,filename):
    return filename # Unicode error if filename has non latin 1 characters

class Kind (models.Model):
    style = models.ForeignKey(Style)
    kind_file = models.FileField(upload_to=get_upload_path)

from shell:

enter image description here

Upvotes: 3

Views: 4772

Answers (2)

super9
super9

Reputation: 30111

Django comes with some helpful functions which you can use here: https://docs.djangoproject.com/en/dev/ref/unicode/#conversion-functions

I think smart_str is what you need.

An alternative is to rename the files that are being uploaded by your users.

Upvotes: 0

Daenyth
Daenyth

Reputation: 37441

I believe the problem is with your string formatting. In python2, it automatically converts between str type (which is a series of bytes) and unicode type, which represents the abstract series of unicode codepoints.

I'm assuming that your filename is of type unicode.

"tmp/%s/%s" is a byte string, so python will try to automatically encode your unicode into str to match. The problem is that it uses the ascii encoding to do so, which can't hold your data.

Changing your return statement to use temp2 instead of filename should work, since now you're using the right types together.


For the future, I'd also recommend watching the presentation I linked to in the comments, as it provides several strategies for avoiding this class of problem. The main one is that you should only use bytes when sending data outside your program. As soon as you receive bytes from the outside world, decode them to unicode, and only encode when you're sending the data out of your program. You should also use unicode string literals internally (u"" instead of "").

I'd also recommend more meaningful variable names than tempN.

Upvotes: 3

Related Questions