Bomin
Bomin

Reputation: 1667

How to write Chinese characters to file by python

I'm walking through a directory and want to write all files names into a file. Here's the piece of code

with open("c:/Users/me/filename.txt", "a") as d:
   for dir, subdirs, files in os.walk("c:/temp"):
      for f in files:
         fname = os.path.join(dir, f)
         print fname
         d.write(fname + "\n")
d.close()

The problem I have is, there are some files that are named in Chinese characters. By using print, I can see the file name correctly in console, but in the target file, it's just a mess... I've tried to open the file like open(u"c:/Users/me/filename.txt", "a"), but it did not work. I also tried to write fname.decode("utf-16"), still does not work...

Upvotes: 10

Views: 15827

Answers (5)

user3349907
user3349907

Reputation: 357

with open("xyz.xml', "w", encoding='utf-8-sig') as f: worked for me.

Upvotes: 2

dsgou
dsgou

Reputation: 179

To succesfully write chinese characters in python 2 you have to do the following.

  1. Open the file using the codecs library which allows you to provide the encoding parameter and set it to unicode.
  2. Write the string in unicode encoding.

The corrected code would be the following:

import codecs

with codecs.open("c:/Users/me/filename.txt", "a", encoding='utf-8') as d:
    for dir, subdirs, files in os.walk("c:/temp"):
        for f in files:
            fname = os.path.join(dir, f)
            print fname
            d.write(fname.decode('utf-8') + "\n")

Note

The same problem does not exist in python 3 so you should also consider making your script python 3 compatible.

Upvotes: 3

JM_BJ
JM_BJ

Reputation: 51

The key is to tell python to prepare the file for being used in "utf-8" format. I wonder why python doesn't assume utf-8 by default. Anyway, try the following:

with open("c:/Users/me/filename.txt", "a", encoding='utf-8') as d:
    for dir, subdirs, files in os.walk("c:/temp"):
        ...

I am using python3.5. So, please be aware that the "encoding" option may be not available in python 2.7. But the idea is to tell python in advance about the encoding, rather than fighting with encoding of each string later.

Upvotes: 3

pp_
pp_

Reputation: 3489

Use str.encode() to encode fname before you write it to the file:

d.write(fname.encode('utf8') + '\n')

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336428

In Python 2, it's a good idea to use codecs.open() if you're dealing with encodings other than ASCII. That way, you don't need to manually encode everything you write. Also, os.walk() should be passed a Unicode string if you're expecting non-ASCII characters in the filenames:

import codecs
with codecs.open("c:/Users/me/filename.txt", "a", encoding="utf-8") as d:
   for dir, subdirs, files in os.walk(u"c:/temp"):
      for f in files:
         fname = os.path.join(dir, f)
         print fname
         d.write(fname + "\n")

No need to call d.close(), the with block already takes care of that.

Upvotes: 4

Related Questions