rykr
rykr

Reputation: 185

Bash sanitize_file_name function

I'm attempting to find a way to sanitize/filter file names in a Bash script the exact same way as the sanitize_file_name function from WordPress works. It has to take a filename string and spit out a clean version that is identical to that function.

You can see the function here.

GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-57-generic x86_64)
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi

Example input file names
These can be and often are practically anything you can make a filename on any operating system, especially Mac and Windows.

This File + Name.mov  
Some, Other - File & Name.mov  
ANOTHER FILE 2 NAME vs2_.m4v  
some & file-name Alpha.m4v  
Some Strange & File ++ Name__.mp4  
This is a - Weird -@ Filename!.mp4

Example output file names
These are how the WordPress sanitize_file_name function makes the examples above.

This-File-Name.mov  
Some-Other-File-Name.mov  
ANOTHER-FILE-2-NAME-vs2_.m4v  
some-file-name-Alpha.m4v  
Some-Strange-File-Name__.mp4  
[email protected]

It doesn't just have to solve these cases, it has perform the same functions that the sanitize_file_name function does or it will produce duplicate files and they won't be updated on the site.

Some thoughts I've had are maybe I can somehow use that function itself but this video encoding server doesn't have PHP on it since it's quite a tiny server and normally just encodes videos and uploads them. It doesn't have much memory, CPU power or disk space, it's a DigitalOcean 512MB RAM server. Maybe I can somehow create a remote PHP script on the web server to handle it through HTTP but again I'm not entirely sure how to do that either through Bash.

It's too complicated for my limited Bash skills so I'm wondering if anyone can help or knows where a script is that does this already. I couldn't find one. All I could find are scripts that change spaces or special characters into underscores or dashes. But this isn't all the sanitize_file_name function does.

In case you are curious, the filenames have to be compatible with this WordPress function because of the way this website is setup to handle videos. It allows people to upload videos through WordPress that are then sent to a separate video server for encoding and then sent to Amazon S3 and CloudFront for serving on the site. However it also allows adding videos through Dropbox using the External Media plugin (which actually is duplicating the video upload with the Dropbox sync now but that's another minor issue). This video server is also syncing to a Dropbox account and whitelisting the folders in it and has this Bash script watching a VideoServer Dropbox folder using inotifywait which copies videos from it to another folder temporarily where the video encoder encodes them. This way when they update the videos in their Dropbox it will automatically re-encode and update the video shown on the site. They could just upload the files through WordPress but they don't seem to want to or don't know how to do that for some reason.

Upvotes: 6

Views: 4844

Answers (2)

Serious Angel
Serious Angel

Reputation: 1555

Inspired by the answer.

EscapeFilename()
{
    printf '%s' "$@" | perl -pe 's/[:;,\?\[\]\/\\=<>''"&\$#*()|~`!{}%+]//g; s/[\s-]+/-/g;';
}

Upvotes: 0

Sampisa
Sampisa

Reputation: 1583

If you have Perl installed, try with:

#!/bin/bash

function sanitize_file_name {
    echo -n $1 | perl -pe 's/[\?\[\]\/\\=<>:;,''"&\$#*()|~`!{}%+]//g;' -pe 's/[\r\n\t -]+/-/g;'
}

filename="Wh00t? it's a -- re@lly-weird {file&name} (with + Plus and__1% #of# [\$qRots\$!]).mov"

cleaned=$(sanitize_file_name "$filename")

echo original : "$filename"
echo sanitised: "$cleaned"

Result is:

original : Wh00t? it's a -- re@lly-weird {file&name} (with + Plus and__1% #of# [$qRots$!]).mov
sanitised: Wh00t-it's-a-re@lly-weird-filename-with-Plus-and__1-of-qRots.mov

Looking at WP function, this emulates it quite well.

Upvotes: 5

Related Questions