Reputation: 11091
My script downloads files from the net and then it saves them under the name taken from the same web server. I need a filter/remover of invalid characters for file/folder names under Windows NTFS.
I would be happy for multi platform filter too.
NOTE: something like htmlentities
would be great....
Upvotes: 5
Views: 6864
Reputation: 14943
filename_string.gsub(/[^\w\.]/, '_')
Explanation: Replace everything except word-characters (letter, number, underscore) and dots
Upvotes: 15
Reputation: 44080
I don't know how you plan to use those files later, but pretty much most reliable solution would be to keep the original filenames in a db table (or otherwise serialized hash), and name physical files after the unique ID that you (or the database) generated.
PS Another advantage of this approach is that you don't have to worry about the files with the same names (or different names that filter to same names).
Upvotes: 0
Reputation: 6926
Like Geo said, by using gsub
you can easily convert all invalid characters to a valid character. For example:
file_names.map! do |f|
f.gsub(/[<invalid characters>]/, '_')
end
You need to replace <invalid characters>
with all the possible characters that your file names might have in them that are not allowed on your file system. In the above code each invalid character is replaced with a _
.
Wikipedia tells us that the following characters are not allowed on NTFS:
(greater than)
So your gsub
call could be something like this:
file_names.map! { |f| f.gsub(/[\x00\/\\:\*\?\"<>\|]/, '_') }
which replaces all the invalid characters with an underscore.
Upvotes: 22
Reputation: 96777
I think your best bet would be gsub
on the filename. One of the things I know you'll need to delete/replace is :
.
Upvotes: 0