Charles Anderson
Charles Anderson

Reputation: 20039

Can't open a file with a Japanese filename in Python

Why doesn't this work in the Python interpreter? I am running the Python 2.7 version of python.exe on Windows 7. My locale is en_GB.

open(u'黒色.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 22] invalid mode ('r') or filename: u'??.txt'

The file does exist, and is readable.

And if I try

name = u'黒色.txt'
name

the interpreter shows

u'??.txt'

Additional:

Okay, I was trying to simplify my problem for the purposes of this forum. Originally the filename was arriving in a cgi script from a web page with a file picker. The idea was to let the web page user upload files to a server:

import cgi
form = cgi.FieldStorage()
fileItems = form['attachment[]']

for fileItem in fileItems:
    if fileItem.file:
        fileName = os.path.split(fileItem.filename)[1]

        f = open(fileName, 'wb')
        while True:
            chunk = fileItem.file.read(100000)
            if not chunk: 
                break
            f.write(chunk)
        f.close()

but the files created at the server side had corrupted names. I started investigating this in the Python interpreter, reproduced the problem (so I thought), and that is what I put into my original question. However, I think now that I managed to create a separate problem.

Thanks to the answers below, I fixed the cgi script by making sure the file name is treated as unicode:

fileName = unicode(os.path.split(fileItem.filename)[1])

I never got my example in the interpreter to work. I suspect that is because my PC has the wrong locale for this.

Upvotes: 2

Views: 1975

Answers (2)

roeland
roeland

Reputation: 5741

Run IDLE if you want to work with Unicode strings interactively in Python. Then inputting or printing any characters will just work.

Upvotes: 0

Mark Tolonen
Mark Tolonen

Reputation: 177554

Here's an example script that reads and writes the file. You can use any encoding for the source file that supports the characters you are writing but make sure the #coding line matches. You can use any encoding for the data file as long as the encoding parameter matches.

#coding:utf8
import io
with io.open(u'黒色.txt','w',encoding='utf8') as f:
    f.write(u'黒色.txt content')

with io.open(u'黒色.txt',encoding='utf8') as f:
    print f.read()

Output:

黒色.txt content

Note the print will only work if the terminal running the script supports Japanese; otherwise, you'll likely get a UnicodeEncodeError. I am on Windows and use an IDE that supports UTF-8 output, since the Windows console uses a legacy US-OEM encoding that doesn't support Japanese.

Upvotes: 1

Related Questions