Reputation:
I have a number of files hiding in my LANG=en_US:UTF-8 filesystem that have been uploaded with unrecognisable characters in their filename.
I need to search the filesystem and return all filenames that have at least one character that is not in the standard range (a-zA-Z0-9 and .-_ etc.)
I have been trying to following but no luck.
find . | egrep [^a-zA-Z0-9_\.\/\-\s]
I'm using Fedora Code 9.
Upvotes: 9
Views: 14247
Reputation: 1322
I had a similar problem to the OP for which I was given a solution on Superuser (see also further comments) that I found more satisfactory than the "convmv solution", although I appreciate to have discovered comvmv too.
Upvotes: -1
Reputation: 536379
find . | egrep [^a-zA-Z0-9_./-\s]
Danger, shell escaping!
bash will be interpreting that last parameter, removing one level of backslash-escaping. Try putting double quotes around the "[^group]" expression.
Also of course this disallows a lot more than UTF-8. It is possible to construct a regex to match valid UTF-8 strings, but it's rather ugly. If you have Python 2.x available you could take advantage of that:
import os.path
def walk(dir):
for child in os.listdir(dir):
child= os.path.join(dir, child)
if os.path.isdir(child):
for descendant in walk(child):
yield descendant
yield child
for path in walk('.'):
try:
u= unicode(path, 'utf-8')
except UnicodeError:
# print path, or attempt to rename file
Upvotes: 2
Reputation: 308031
convmv
might be interesting to you. It doesn't just find those files, but also supports renaming them to correct file names (if it can guess what went wrong).
Upvotes: 17