galacticninja
galacticninja

Reputation: 131

How do I allow opening of files that have Unicode characters in their filenames?

I have this Python script here that opens a random video file in a directory when run:

import glob,random,os  
files = glob.glob("*.mkv")  
files.extend(glob.glob("*.mp4"))  
files.extend(glob.glob("*.tp"))  
files.extend(glob.glob("*.avi"))  
files.extend(glob.glob("*.ts"))  
files.extend(glob.glob("*.flv"))  
files.extend(glob.glob("*.mov"))  
file = random.choice(files)  
print "Opening file %s..." % file  
cmd = "rundll32 url.dll,FileProtocolHandler \"" + file + "\""  
os.system(cmd)

Source: An answer in my Super User post, 'How do I open a random file in a folder, and set that only files with the specified filename extension(s) should be opened?'

This is called by a BAT file, with this as its script:

C:\Python27\python.exe "C:\Programs\Scripts\open-random-video.py" cd   

I put this BAT file in the directory I want to open random videos of.

In most cases it works fine. However, I can't make it open files with Unicode characters (like Japanese or Korean characters in my case) in their filenames.

This is the error message when the BAT file and Python script is run on a directory and opens a file with Unicode characters in its filename:

C:\TestDir>openrandomvideo.BAT

C:\TestDir>C:\Python27\python.exe "C:\Programs\Scripts\open-random-video.py" cd
The filename, directory name, or volume label syntax is incorrect.

Note that the filename of the .FLV video file in that log is changed from its original filename (소시.flv) to '∩╗┐' in the command line log.

EDIT: I learned that the above command line error message is due to saving the BAT file as 'UTF-8 with BOM'. Saving it as 'ANSI or UTF-16' shows the following message instead, but still does not open the file:

C:\TestDir>openrandomvideo.BAT

C:\TestDir>C:\Python27\python.exe "C:\Programs\Scripts\open-random-video.py" cd
Opening file ??.flv...

Now, the filename of the .FLV video file in that log is changed from its original filename (소시.flv) to '??.flv.' in the command line log.

I'm using Python 2.7 on Windows 7, 64-bit.

How do I allow opening of files that have Unicode characters in their filenames?

Upvotes: 2

Views: 1551

Answers (4)

Baffe Boyois
Baffe Boyois

Reputation: 2140

The error when running the BAT file is because the BAT file itself is saved as "UTF-8 with BOM". The "" bytes are not a corrupted filename, they are the literal first bytes stored in the BAT file. Re-save the BAT file as ANSI or UTF-16, which are the only encodings supported for BAT files.

Upvotes: 2

oefe
oefe

Reputation: 19916

Either use Unicode literals as described by J. F. Sebastian, or use Python 3, which always uses Unicode.

(For Python 3, your script will need a minor modification: print is a function now, so you have to put parentheses around the parameter list.)

Upvotes: 0

jfs
jfs

Reputation: 414139

Just use Unicode literals e.g., u".mp4" everywhere. IO functions in Python will return Unicode filenames back if you give them Unicode input (internally they might use Unicode-aware Windows API):

import os
import random

videodir = u"." # get videos from current directory
extensions = tuple(u".mkv .mp4 .tp .avi .ts .flv .mov".split())
files = [file for file in os.listdir(videodir) if file.endswith(extensions)]
if files: # at least one video file exists
    random_file = random.choice(files)
    os.startfile(os.path.join(videodir, random_file)) # start the video
else:
    print('No %s files found in "%s"' % ("|".join(extensions), videodir,))

If you want to emulate how your web browser would open video files then you could use webbrowser.open() instead of os.startfile() though the former might use the latter internally on Windows anyway.

Upvotes: 3

emaniacs
emaniacs

Reputation: 72

please familiarize yourself to add # -*- coding: utf-8 -*- in your source code,

so python understanding about your unicode.

Upvotes: -1

Related Questions