Karthikeyan S
Karthikeyan S

Reputation: 51

Detecting file name with Unicode characters in windows

Python version: 2.7.3

Filename: test snowman character --☃--.mp3

Ran the following tests, None of them proved successful.

>>> os.path.exist('test snowman character --☃--.mp3')
False
>>> os.path.exist(repr('test snowman character --☃--.mp3'))
False
>>> os.path.isfile('test snowman character --\\xe2\\x98\\x83--.mp3')
False
>>> os.path.isfile(r'test snowman character --\\xe2\\x98\\x83--.mp3')
False
>>> os.path.isfile('test snowman character --☃--.mp3'.decode('utf-8'))
False

Tried to retrieve files with glob, even that test failed.

Objective is to detect and copy this file to another folder, Please Advise.

Upvotes: 5

Views: 2512

Answers (3)

Hubro
Hubro

Reputation: 59323

The Windows NTFS filesystem uses UTF-16 (just ask Martijn Pieters), so try this:

>>> os.path.exists(u'test snowman character --☃--.mp3'.encode("UTF-16"))

But first make sure the input encoding of the interpreter is correct. print repr(u'test snowman character --☃--.mp3') should output:

u'test snowman character --\u2603--.mp3'

Note: I am unable to test this as Windows CMD won't let me input snowman symbols. In any case, it turns out Python will do the right thing if you just give it a Unicode string, so the encode call is superfluous. To summarize, I recommend Martijn Pieters' answer.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121484

Use a unicode value; preferably with a unicode escape sequence:

os.path.isfile(u'test snowman character --\u2603--.mp3')

Python on Windows will use the correct Windows API for listing UTF16 files when you give it a unicode path.

For more information on how Python alters behaviour with unicode vs. bytestring file paths, see the Python Unicode HOWTO.

Upvotes: 3

SpliFF
SpliFF

Reputation: 38956

Literal Unicode strings are supposed to start with u', try os.path.exist(u'test snowman character --☃--.mp3')

If you want to use escape sequences it's ur', as in os.path.isfile(ur'test snowman character --\\xe2\\x98\\x83--.mp3')

http://docs.python.org/2.7/reference/lexical_analysis.html#strings

Upvotes: 0

Related Questions