singmotor
singmotor

Reputation: 4180

Recursively looking for a list of file types

I want to use bash to remove all the files in a directory that aren't in an associative array of file extensions. (i.e. delete all the files in a directory that aren't image files, for example)

This question very clearly answers how to do this for a single file extension, but I'm not sure how to do it for a whole list.

currently I'm doing this

for f in $(find . -type f ! -name '*.png' -and ! -name '*.jpg' ); do rm "$f"; done

but it seems ugly to just add a massive list of "-and -name '*.aaa'" inside the parenthesis for every file type.

Is there a way to pass find an associate array like

declare -A allowedTypes=([*.png]=1 [*.jpg]=1 [*.gif]=1)

or will I just need to add a lot of "-and ! -name ___"?

Thanks!

Upvotes: 3

Views: 301

Answers (2)

Rayne
Rayne

Reputation: 2659

Assumption: allowedTypes contains only trusted input and only valid suffixes.

The first snippet supports multi-level suffixes like tar.gz. It uses find, a regular expression and a list of allowed suffixes allowedTypes.

allowedTypes=(png gif jpg)

# keepTypes='png|gif|jpg'
keepTypes="$(echo "${allowedTypes[@]}" | tr ' ' '|')"

find . -type f -regextype awk ! -iregex '(.*).('"$keepTypes"')' -exec echo rm {} \;

If you want to keep your associate array, then you could use the following snippet. It needs additional work to support multi-level file suffixes.

declare -A allowedTypes=([*.png]=1 [*.jpg]=1 [*.gif]=1)

keepTypes="$(echo "${!allowedTypes[@]}" | tr ' ' '|' | tr -d '.*')"

It would be nice if there would be a way to replace the separators with a built-in tool instead of tr but I found none. ${allowedTypes[@]//\ /test} did not replace the whitespaces between the items.

Upvotes: 1

Inian
Inian

Reputation: 85825

The whole idea of using find int the first place is not needed. The shell globbing support in bash is sufficient enough for this requirement. The bash shell provides an extended glob support option using which you can get the file names under recursive paths that don't end with the extensions you want to ignore.

The extended option is extglob which needs to be set using the shopt option as below. Additionally you could use couple of options more i.e. nullglob in which an unmatched glob is swept away entirely, replaced with a set of zero words. And globstar that allows to recurse through all the directories

shopt -s extglob nullglob globstar

Now all you need to do is form the glob expression to exclude the files of type *.png, *.jpg and *.gif which you can do as below. We use an array to populate the glob results because when quoted properly and expanded, the filenames with special characters would remain intact

fileList=(**/!(*.jpg|*.gif|*.png))

The option ** is to recurse through the sub-folders and !() is a negate operation to not include any of the file extensions listed inside. Now for printing the actual files, just do

printf '%s\n' "${fileList[@]}"

If your intentions is for example to remove all the files identified, you don't need to store the glob results in the array. One could use the array approach when writing simple shell scripts which need to use the results of the glob. But for a case of deleting the files, you could use the rm command.

At first you could check if the files returned are as expected and once you confirmed you could the rm on the expression. Use ls to see if the files are listed as expected

ls -1 -- **/!(*.jpg|*.gif|*.png)

and now after confirming the files to delete, do rm at your own risk.

rm -- **/!(*.jpg|*.gif|*.png)

Upvotes: 3

Related Questions