phil
phil

Reputation: 113

Python: How to use shutil.copy() with unicode filenames

So I've been pounding my head against this problem for days and I just can't figure it out. I've read this, this and this and feel like I must be missing something.

I'm trying to copy a simple text file with a complex unicode title into a temp folder with this code:

self._temp_path = tempfile.mkdtemp()
self.src = os.path.join(self._temp_path, 'src')
os.makedirs(self.src)
self.dst = os.path.join(self._temp_path, 'dst')
os.makedirs(self.dst)
self.dirname = dirname = os.path.join(os.path.dirname(__file__), 'testfiles')
f = u'file-\xe3\x82\xa8\xe3\x83\xb3\xe3\x83\x89\xe3\x83\xac\xe3\x82\xb9.txt'
src = os.path.join(dirname, f)
dst = os.path.join(self.src, f)
shutil.copy2(src, dst)

And I receive the following message when I execute the test:

s = '/tmp/tmpc1gzwf/src/file-ã¨ã³ãã¬ã¹.txt'
>           st = os.stat(s)
E           UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-38: ordinal not in range(128)

I've tried using both shutil.copy and shutil.copy2, they produced identical results. I've also tried changing:

shutil.copy2(src, dst)

to:

shutil.copy2(src.encode('utf-8'), dst.encode('utf-8'))

But that resulted in this error message, due to the encoding mangling the filename:

src = '/home/phil/projects/unicode_copy/tests/testfiles/file-\xc3\xa3\xc2\x82\xc2\xa8\xc3\xa3\xc2\x83\xc2\xb3\xc3\xa3\xc2\x83\xc2\x89\xc3\xa3\xc2\x83\xc2\xac\xc3\xa3\xc2\x82\xc2\xb9.txt'
dst = '/tmp/tmpCsb3qW/src/file-\xc3\xa3\xc2\x82\xc2\xa8\xc3\xa3\xc2\x83\xc2\xb3\xc3\xa3\xc2\x83\xc2\x89\xc3\xa3\xc2\x83\xc2\xac\xc3\xa3\xc2\x82\xc2\xb9.txt'
def copyfile(src, dst):
...
>       with open(src, 'rb') as fsrc:
E       IOError: [Errno 2] No such file or directory: '/home/phil/projects/unicode_copy/tests/testfiles/file-\xc3\xa3\xc2\x82\xc2\xa8\xc3\xa3\xc2\x83\xc2\xb3\xc3\xa3\xc2\x83\xc2\x89\xc3\xa3\xc2\x83\xc2\xac\xc3\xa3\xc2\x82\xc2\xb9.txt'

After trying many other combinations of encode() and decode() at various points in the code, I've given up. What is the proper way to declare a unicode filename and pass it to shutil.copy?

Upvotes: 2

Views: 3359

Answers (1)

Bart Van Loon
Bart Van Loon

Reputation: 1510

I quickly ran the following code based on your code on my system and it seems to work just fine:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import shutil
import tempfile

from pathlib import Path


class Code():

    def run(self):
        self._temp_path = Path(tempfile.mkdtemp())
        self.dstdir = self._temp_path / 'dst'
        os.makedirs(self.dstdir)

        self.srcdir = Path(os.path.dirname(__file__)) / 'testfiles'
        filename = u'file-\xe3\x82\xa8\xe3\x83\xb3\xe3\x83\x89\xe3\x83\xac\xe3\x82\xb9.txt'

        self.srcpath = self.srcdir / filename
        self.dstpath = self.dstdir / filename

        with open(self.srcpath, 'w') as f:
            f.write('test')

        shutil.copy2(self.srcpath, self.dstpath)


if __name__ == '__main__':
    code = Code()
    code.run()
    print(code.dstpath)

Sample output is /tmp/tmpgqwktb_v/dst/file-ã¨ã³ãã¬ã¹.txt.

Possible reasons are:

  • I'm using Python3, which has notable better unicode support
  • I'm running Linux in the en_GB.UTF-8 locale
  • The second line of my script, declaring the encoding of the source

Perhaps the differences with your environment can explain your error.

Hope this helps!

Upvotes: 2

Related Questions