Reputation: 304
What I have in mind is iterating through a folder to check whether the file names contain any Cyrillic characters, if they do, rename those files to something else.
How could I do this ?
Upvotes: 4
Views: 2730
Reputation: 24061
There is a library that was written for this: Python's transliterate lib.
So, first you need to get your file names. For that, use os.listdir():
from os import listdir
from os.path import isfile, join
files = [ f for f in listdir(dir) if isfile(join(dir,f)) ]
Now, you can look at each file in files, substitute any characters as needed:
import transliterate
newName = translit(filename, 'ru', reversed=True)
Then just rename the files with os.rename:
os.rename(filename, newName)
Upvotes: 1
Reputation: 7602
Python 3
This one checks each character of the passed string, whether it's in the Cyrillic block and returns True
if the string has a Cyrillic character in it. Strings in Python3 are unicode by default. The function encodes each character to utf-8 and checks whether this yields two bytes matching the table block that contains Cyrillic characters.
def isCyrillic(filename):
for char in filename:
char_utf8 = char.encode('utf-8') # encode to utf-8
if len(char_utf8) == 2 \ # check if we have 2 bytes and if the
and 0xd0 <= char_utf8[0] <= 0xd3\ # first and second byte point to
and 0x80 <= char_utf8[1] <= 0xbf: # Cyrillic block (unicode U+0400-U+04FF)
return True
return False
Same function using ord()
as suggested in comment
def isCyrillicOrd(filename):
for char in filename:
if 0x0400 <= ord(char) <= 0x04FF: # directly checking unicode code point
return True
return False
Test Directory
cycont
|---- asciifile.txt
|---- кириллфайл.txt
|---- украї́нська.txt
|---- संस्कृत.txt
Test
import os
for (dirpath, dirnames, filenames) in os.walk('G:/cycont'):
for filename in filenames:
print(filename, isCyrillic(filename), isCyrillicOrd(filename))
Output
asciifile.txt False False
кириллфайл.txt True True
украї́нська.txt True True
संस्कृत.txt False False
Upvotes: 3
Reputation: 21312
Python 2:
# -*- coding: utf-8 -*-
def check_value(value):
try:
value.decode('ascii')
except UnicodeDecodeError:
return False
else:
return True
Python 3:
Python 3 'str' object doesn't have the attribute 'decode'. So you can use the encode as follows.
# -*- coding: utf-8 -*-
def check_value(value):
try:
value.encode('ascii')
except UnicodeEncodeError:
return False
else:
return True
Then you can gather your file names, and pass them through the check_value function.
Upvotes: 3