Hakim
Hakim

Reputation: 11700

Remove files not containing a specific string

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?

Upvotes: 21

Views: 30600

Answers (8)

John Marshall
John Marshall

Reputation: 324

Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.

find -type f ! -name "*my string*" -exec rm {} \; -print

Upvotes: 1

jeffpkamp
jeffpkamp

Reputation: 2866

This worked for me, you can remove the -f if you're okay with deleting directories.

myString="keepThis"
for x in `find  ./`
    do if [[ -f $x && ! $x =~ $myString ]]
        then rm $x
    fi
done

Upvotes: 0

Ian Macalinao
Ian Macalinao

Reputation: 1668

One possibility is

find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;

You can remove the echo if the output of this preview looks correct.

The equivalent with -delete is

find . -type f '!' -exec grep -q "user_id" {} \; -delete

but then you don't get the nice preview option.

Upvotes: 4

estebancod
estebancod

Reputation: 61

To remove files not containing a specific string:

Bash:

To use them, enable the extglob shell option as follows:

shopt -s extglob

And just remove all files that don't have the string "fix":

rm !(*fix*)

If you want to don't delete all the files that don't have the names "fix" and "class":

rm !(*fix*|*class*)

Zsh:

To use them, enable the extended glob zsh shell option as follows:

setopt extended_glob

Remove all files that don't have the string, in this example "fix":

rm -- ^*fix*

If you want to don't delete all the files that don't have the names "fix" and "class":

rm -- ^(*fix*|*class*)

It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.

Here are the sources:

https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/

https://codeday.me/es/qa/20190819/1296122.html

Upvotes: 6

Nick
Nick

Reputation: 2050

The following will work:

find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm

This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.

Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.

The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:

my string

If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)

Finally you call xargs again to get rid of all the files that you've found with the previous calls.

The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:

rm $(some command that produces lots of filenames)

It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.

Note that this solution would have been simpler without the need to cope with files containing white space and new lines.

Alternatively

grep -r -L -Z 'my string' . | xargs --null rm

will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).

Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.

Upvotes: 17

rodion
rodion

Reputation: 15029

EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to @ormaaj for pointing it out!

find . -type f | grep -v "exclude string" | xargs rm

Note: grep pattern will match against full file path from current directory (see find . -type f output)

Upvotes: 5

ormaaj
ormaaj

Reputation: 6577

GNU grep and bash.

grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done

Use a find solution if portability is needed. This is slightly faster.

Upvotes: 5

Alan Curry
Alan Curry

Reputation: 14711

I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.

find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm

This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.

Upvotes: 1

Related Questions