Reputation: 300
I have the following script to process filenames with non-latin characters:
import os
filelst = []
allfile = os.listdir(os.getcwd())
for file in allfile:
if os.path.isfile(file):
filelst.append(file)
w = open(os.getcwd()+'\\_filelist.txt','w+')
for file in allfile:
w.write(file)
w.write("\n")
w.close()
filelist in my folder:
new 1.py
ああっ女神さまっ 小っちゃいって事は便利だねっ.1998.Ep0108.x264.AC3CalChi.avi
ああっ女神さまっ 小っちゃいって事は便利だねっ.1998.Ep0108.x264.AC3CalChi.srt
output in _filelist.txt:
new 1.py
???????? ??????????????.1998.Ep01-08.x264.AC3-CalChi.avi
???????? ??????????????.1998.Ep01-08.x264.AC3-CalChi.srt
Upvotes: 0
Views: 613
Reputation: 27744
You should get the list of files as Unicode strings instead by passing a Unicode file path to listdir
. As you're using getcwd, use: os.getcwdu()
Then open your output file with a text encoding wrapper. io
module is the new way to do this (io
handles Universal newlines correctly).
Putting it all together:
import os
import io
filelst = []
allfile = os.listdir(os.getcwdu())
for file in allfile:
if os.path.isfile(file):
filelst.append(file)
w = io.open(os.getcwd()+'\\_filelist.txt','w+', encoding="utf-8")
for file in allfile:
w.write(file)
w.write("\n")
w.close()
In Windows and OS X, this will just work as filename translation is enforced. In Linux, a filename can be any encoding (or non at all!). Therefore, ensure that whatever is creating your files (avi + srt), is using UTF-8, your terminal is set to UTF-8 and your locale is UTF-8.
Upvotes: 2
Reputation: 107357
You need to open your file with a proper encoding to write unicode in it.You can use codecs
module for opening the file:
import codecs
with codecs.open(os.getcwd()+'\\_filelist.txt','w+',encoding='your-encoding') as w:
for file in allfile:
w.write(file + '\n')
You can use UTF-8
as your encoding which is a universal encoding or another proper encoding based on your unicode type.Also note that instead of opening the file and closing it manually you can use with
statement to open the file which will close the file automatically at the end of the block.
Upvotes: 1