Reputation: 475
Problem statement:
While automatically copying files between input directories, and output directories my program fails on a path that contains unicode (most likely Korean) characters.
The whole script is publicly available under: This Link
The file that causes the error is also publicly available: File That Causes the Error
The specific part of the code that fails seems to be:
for root, _, filenames in os.walk(maybe_dir):
for file in filenames:
# Prepare relative paths:
relative_dir = os.path.relpath(root, maybe_dir)
relative_file = os.path.join(relative_dir, file)
# Get unique filename:
unique_filename = uuid.uuid4().hex
unique_filename_with_ext = unique_filename + file_extension
new_path_and_filename = os.path.join(
full_output_path, unique_filename_with_ext
)
current_file = os.path.abspath(os.path.join(root, file))
# Copying files:
shutil.copy(current_file, new_path_and_filename)
The error:
Traceback (most recent call last):
File "F:\Projects\SC2DatasetPreparator\src\directory_flattener.py", line 96, in <module>
directory_flattener(
File "F:\Projects\SC2DatasetPreparator\src\directory_flattener.py", line 60, in directory_flattener
shutil.copy(current_file, new_path_and_filename)
File "D:\Programs\Python3_10\lib\shutil.py", line 417, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "D:\Programs\Python3_10\lib\shutil.py", line 254, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'F:\\Projects\\SC2DatasetPreparator\\processing\\directory_flattener\\input\\2017_IEM_XI_World_Championship_Katowice\\IEM XI - World Championship - StarCraft II Replays\\RO24\\Group A\\Solar Vs herO\\Ùë¦ý+ñÝü¼ ý×¼Û¦£Ù¦£ ýºÇÛÁ¼ - ÝåáÙäêÙ¿+Ýè© (Û¦ÁÝùêýØÿ ý£áýé¦) (2) Solar vs Hero game 1.SC2Replay'
The error itself is unexpected as I am using absolute paths and the script works for 6 other directories before it fails on that specific file.
The path clearly exists and can be accessed manually:
Steps to attempt to reproduce the error are as follows:
./processing/directory_flattener/input/test_dir
Closing Remarks:
It seems that the script worked before on Python 3.7 because I have verified the output that I have received before updating to Python 3.10 and within the directory mapping that is created the files with unicode characters in their path are present:
{
"ce2f4610891e472190a0852c617b35e8": "RO24\\Group A\\Solar Vs herO\\\u00d9\u00eb\u00a6\u00fd+\u00f1\u00dd\u00fc\u00bc \u00fd\u00d7\u00bc\u00db\u00a6\u00a3\u00d9\u00a6\u00a3 \u00fd\u00ba\u00c7\u00db\u00c1\u00bc - \u00dd\u00e5\u00e1\u00d9\u00e4\u00ea\u00d9\u00bf+\u00dd\u00e8\u00a9 (\u00db\u00a6\u00c1\u00dd\u00f9\u00ea\u00fd\u00d8\u00ff \u00fd\u00a3\u00e1\u00fd\u00e9\u00a6) (2) Solar vs Hero game 1.SC2Replay",
"dcc82d633910479c95d06ef418fcf2e0": "RO24\\Group A\\Solar Vs herO\\\u00fd\u00fb\u00a6\u00d9\u00a6\u00e4\u00fd\u00e4\u00f1 \u00d9\u00aa\u00bc\u00dd\u00f6\u00e4 - \u00d9\u00d7\u00ff\u00d9\u00ec\u00f6 Solar vs Hero game 2.SC2Replay",
}
While searching for an answer I have only stumbled upon similar problems in Python 2 where the .decode()
method was suggested as a solution. Applying such measures did not help to solve the issue.
Upvotes: 1
Views: 520
Reputation: 475
The cause for this error was the Maximum Path Length Limitation which limited the ability to use paths longer than 260 characters on Windows.
The error was fixed by adding a prefix of "\\?\"
to the path that was used to access and copy the file.
This means that the following line of code was changed:
current_file = os.path.abspath(os.path.join(root, file))
Into this:
current_file = "\\\\?\\" + os.path.abspath(os.path.join(root, file))
Upvotes: 1