Reputation: 185
I'm attempting to find a way to sanitize/filter file names in a Bash script the exact same way as the sanitize_file_name
function from WordPress works. It has to take a filename string and spit out a clean version that is identical to that function.
You can see the function here.
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-57-generic x86_64)
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
Example input file names
These can be and often are practically anything you can make a filename on any operating system, especially Mac and Windows.
This File + Name.mov
Some, Other - File & Name.mov
ANOTHER FILE 2 NAME vs2_.m4v
some & file-name Alpha.m4v
Some Strange & File ++ Name__.mp4
This is a - Weird -@ Filename!.mp4
Example output file names
These are how the WordPress sanitize_file_name
function makes the examples above.
This-File-Name.mov
Some-Other-File-Name.mov
ANOTHER-FILE-2-NAME-vs2_.m4v
some-file-name-Alpha.m4v
Some-Strange-File-Name__.mp4
[email protected]
It doesn't just have to solve these cases, it has perform the same functions that the sanitize_file_name
function does or it will produce duplicate files and they won't be updated on the site.
Some thoughts I've had are maybe I can somehow use that function itself but this video encoding server doesn't have PHP on it since it's quite a tiny server and normally just encodes videos and uploads them. It doesn't have much memory, CPU power or disk space, it's a DigitalOcean 512MB RAM server. Maybe I can somehow create a remote PHP script on the web server to handle it through HTTP but again I'm not entirely sure how to do that either through Bash.
It's too complicated for my limited Bash skills so I'm wondering if anyone can help or knows where a script is that does this already. I couldn't find one. All I could find are scripts that change spaces or special characters into underscores or dashes. But this isn't all the sanitize_file_name
function does.
In case you are curious, the filenames have to be compatible with this WordPress function because of the way this website is setup to handle videos. It allows people to upload videos through WordPress that are then sent to a separate video server for encoding and then sent to Amazon S3 and CloudFront for serving on the site. However it also allows adding videos through Dropbox using the External Media plugin (which actually is duplicating the video upload with the Dropbox sync now but that's another minor issue). This video server is also syncing to a Dropbox account and whitelisting the folders in it and has this Bash script watching a VideoServer Dropbox folder using inotifywait
which copies videos from it to another folder temporarily where the video encoder encodes them. This way when they update the videos in their Dropbox it will automatically re-encode and update the video shown on the site. They could just upload the files through WordPress but they don't seem to want to or don't know how to do that for some reason.
Upvotes: 6
Views: 4844
Reputation: 1555
Inspired by the answer.
EscapeFilename()
{
printf '%s' "$@" | perl -pe 's/[:;,\?\[\]\/\\=<>''"&\$#*()|~`!{}%+]//g; s/[\s-]+/-/g;';
}
Upvotes: 0
Reputation: 1583
If you have Perl installed, try with:
#!/bin/bash
function sanitize_file_name {
echo -n $1 | perl -pe 's/[\?\[\]\/\\=<>:;,''"&\$#*()|~`!{}%+]//g;' -pe 's/[\r\n\t -]+/-/g;'
}
filename="Wh00t? it's a -- re@lly-weird {file&name} (with + Plus and__1% #of# [\$qRots\$!]).mov"
cleaned=$(sanitize_file_name "$filename")
echo original : "$filename"
echo sanitised: "$cleaned"
Result is:
original : Wh00t? it's a -- re@lly-weird {file&name} (with + Plus and__1% #of# [$qRots$!]).mov
sanitised: Wh00t-it's-a-re@lly-weird-filename-with-Plus-and__1-of-qRots.mov
Looking at WP function, this emulates it quite well.
Upvotes: 5