Reputation: 1667
I'm walking through a directory and want to write all files names into a file. Here's the piece of code
with open("c:/Users/me/filename.txt", "a") as d:
for dir, subdirs, files in os.walk("c:/temp"):
for f in files:
fname = os.path.join(dir, f)
print fname
d.write(fname + "\n")
d.close()
The problem I have is, there are some files that are named in Chinese characters. By using print
, I can see the file name correctly in console, but in the target file, it's just a mess... I've tried to open the file like open(u"c:/Users/me/filename.txt", "a")
, but it did not work. I also tried to write fname.decode("utf-16")
, still does not work...
Upvotes: 10
Views: 15827
Reputation: 357
with open("xyz.xml', "w", encoding='utf-8-sig') as f: worked for me.
Upvotes: 2
Reputation: 179
To succesfully write chinese characters in python 2 you have to do the following.
The corrected code would be the following:
import codecs
with codecs.open("c:/Users/me/filename.txt", "a", encoding='utf-8') as d:
for dir, subdirs, files in os.walk("c:/temp"):
for f in files:
fname = os.path.join(dir, f)
print fname
d.write(fname.decode('utf-8') + "\n")
The same problem does not exist in python 3 so you should also consider making your script python 3 compatible.
Upvotes: 3
Reputation: 51
The key is to tell python to prepare the file for being used in "utf-8" format. I wonder why python doesn't assume utf-8 by default. Anyway, try the following:
with open("c:/Users/me/filename.txt", "a", encoding='utf-8') as d:
for dir, subdirs, files in os.walk("c:/temp"):
...
I am using python3.5. So, please be aware that the "encoding" option may be not available in python 2.7. But the idea is to tell python in advance about the encoding, rather than fighting with encoding of each string later.
Upvotes: 3
Reputation: 3489
Use str.encode()
to encode fname
before you write it to the file:
d.write(fname.encode('utf8') + '\n')
Upvotes: 2
Reputation: 336428
In Python 2, it's a good idea to use codecs.open()
if you're dealing with encodings other than ASCII. That way, you don't need to manually encode everything you write. Also, os.walk()
should be passed a Unicode string if you're expecting non-ASCII characters in the filenames:
import codecs
with codecs.open("c:/Users/me/filename.txt", "a", encoding="utf-8") as d:
for dir, subdirs, files in os.walk(u"c:/temp"):
for f in files:
fname = os.path.join(dir, f)
print fname
d.write(fname + "\n")
No need to call d.close()
, the with
block already takes care of that.
Upvotes: 4