Tolgay Toklar
Tolgay Toklar

Reputation: 4343

Shell script to delete whose files names are not in a text file

I have a txt file which contains list of file names

Example:

10.jpg
11.jpg
12.jpeg
...

In a folder this files should protect from delete process and other files should delete.

So i want oppposite logic of this question: Shell command/script to delete files whose names are in a text file

How to do that?

Upvotes: 2

Views: 388

Answers (4)

agc
agc

Reputation: 8406

Provided there's no spaces or special escaped chars in the file names, either of these (or variations of these) would work:

  1. rm -v $(stat -c %n * | sort excluded_file_list | uniq -u)

  2. stat -c %n * | grep -vf excluded_file_list | xargs rm -v

Upvotes: 1

zhenguoli
zhenguoli

Reputation: 2308

Use extglob and Bash extended pattern matching !(pattern-list):

!(pattern-list)
Matches anything except one of the given patterns
where a pattern-list is a list of one or more patterns separated by a |.

extglob
If set, the extended pattern matching features described above are enabled.

So for example:

$ ls
10.jpg  11.jpg  12.jpeg  13.jpg  14.jpg  15.jpg  16.jpg  a.txt
$ shopt -s extglob
$ shopt | grep extglob
extglob         on
$ cat a.txt
10.jpg
11.jpg
12.jpeg
$ tr '\n' '|' < a.txt
10.jpg|11.jpg|12.jpeg|
$ ls !(`tr '\n' '|' < a.txt`)
13.jpg  14.jpg  15.jpg  16.jpg  a.txt

The deleted files are 13.jpg 14.jpg 15.jpg 16.jpg a.txt according to the example.

So with extglob and !(pattern-list), we can obtain the files which are excluded based on the file content.
Additionally, if you want to exclude the entries starting with ., then you could switch on the dotglob option with shopt -s dotglob.

Upvotes: 3

George Vasiliou
George Vasiliou

Reputation: 6335

This is one way that will work with bash GLOBIGNORE:

$ cat file2
10.jpg
11.jpg
12.jpg
$ ls *.jpg
10.jpg  11.jpg  12.jpg  13.jpg
$ echo $GLOBIGNORE

$ GLOBIGNORE=$(tr '\n' ':' <file2 )
$ echo $GLOBIGNORE
10.jpg:11.jpg:12.jpg:

$ ls *.jpg
13.jpg

As it is obvious, globing ignores whatever (file, pattern, etc) is included in the GLOBIGNORE bash variable.

This is why the last ls reports only file 13.jpg since files 10,11 and 12.jpg are ignored.

As a result using rm *.jpg will remove only 13.jpg in my system:

$ rm -iv *.jpg
rm: remove regular empty file '13.jpg'? y
removed '13.jpg'

When you are done, you can just set GLOBIGNORE to null:

$ GLOBIGNORE=

Worths to be mentioned, that in GLOBIGNORE you can also apply glob patterns instead of single filenames, like *.jpg or my*.mp3 , etc

Alternative :
We can use programming techniques (grep, awk, etc) to compare the file names present in ignorefile and the files under current directory:

$ awk 'NR==FNR{f[$0];next}(!($0 in f))' file2 <(find . -type f -name '*.jpg' -printf '%f\n')
13.jpg

$ rm -iv "$(awk 'NR==FNR{f[$0];next}(!($0 in f))' file2 <(find . -type f -name '*.jpg' -printf '%f\n'))"
rm: remove regular empty file '13.jpg'? y
removed '13.jpg'

Note: This also makes use of bash process substitution, and will break if filenames include new lines.

Upvotes: 2

5gon12eder
5gon12eder

Reputation: 25419

Another alternative to George Vasiliou's answer would be to read the file with the names of the files to keep using the Bash builtin mapfile and then check for each of the files to be deleted whether it is in that list.

#! /bin/bash -eu

mapfile -t keepthose <keepme.txt
declare -a deletethose

for f in "$@"
do
    keep=0
    for not in "${keepthose[@]}"
    do
        [ "${not}" = "${f}" ] && keep=1 || :
    done
    [ ${keep} -gt 0 ] || deletethose+=("${f}")
done

# Remove the 'echo' if you really want to delete files.
echo rm -f "${deletethose[@]}"

The -t option causes mapfile to trim the trailing newline character from the lines it reads from the file. No other white-space will be trimmed, though. This might be what you want if your file names actually contain white-space but it could also cause subtle surprises if somebody accidentally puts a space before or after the name of an important file they want to keep.

Note that I'm first building a list of the files that should be deleted and then delete them all at once rather than deleting each file individually. This saves some sub-process invocations.

The lookup in the list, as coded above, has linear complexity which gives the overall script quadratic complexity (precisely, N × M where N is the number of command-line arguments and M the number of entries in the keepme.txt file). If you only have a few dozen files, this should be fine. Unfortunately, I don't know of a better way to check for set membership in Bash. (We cannot use the file names as keys in an associative array because they might not be proper identifiers.) If you are concerned with performance for many files, using a more powerful language like Python might be worth consideration.

I would also like to mention that the above example simply compares strings. It will not realize that important.txt and ./important.txt are the same file and hence delete the file. It would be more robust to convert the file name to a canonical path using readlink -f before comparing it.

Furthermore, your users might want to be able to put globing patterns (like important.* into the list of files to keep. If you want to handle those, extra logic would be required.

Overall, specifying what files to not delete seems a little dangerous as the error is on the bad side.

Upvotes: 1

Related Questions