Reputation: 11700
I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?
Upvotes: 21
Views: 30600
Reputation: 324
Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.
find -type f ! -name "*my string*" -exec rm {} \; -print
Upvotes: 1
Reputation: 2866
This worked for me, you can remove the -f if you're okay with deleting directories.
myString="keepThis"
for x in `find ./`
do if [[ -f $x && ! $x =~ $myString ]]
then rm $x
fi
done
Upvotes: 0
Reputation: 1668
One possibility is
find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;
You can remove the echo
if the output of this preview looks correct.
The equivalent with -delete
is
find . -type f '!' -exec grep -q "user_id" {} \; -delete
but then you don't get the nice preview option.
Upvotes: 4
Reputation: 61
To remove files not containing a specific string:
Bash:
To use them, enable the extglob shell option as follows:
shopt -s extglob
And just remove all files that don't have the string "fix":
rm !(*fix*)
If you want to don't delete all the files that don't have the names "fix" and "class":
rm !(*fix*|*class*)
Zsh:
To use them, enable the extended glob zsh shell option as follows:
setopt extended_glob
Remove all files that don't have the string, in this example "fix":
rm -- ^*fix*
If you want to don't delete all the files that don't have the names "fix" and "class":
rm -- ^(*fix*|*class*)
It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.
Here are the sources:
https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/
https://codeday.me/es/qa/20190819/1296122.html
Upvotes: 6
Reputation: 2050
The following will work:
find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm
This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c
to see the effect of the -print0
argument.
Then the --null
parameter to xargs
tells it to accept null-terminated inputs. xargs
will then call grep
on a list of filenames.
The -Z
argument to grep
works like the -print0
argument to find
, so grep will print out its results null-terminated (which is why the final call to xargs
needs a --null
option too). The -L
argument to grep
causes grep
to print the filenames of those files on its command line (that xargs
has added) which don't match the regular expression:
my string
If you want simple matching without regular expression magic then add the -F
option. If you want more powerful regular expressions then give a -E
argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)
Finally you call xargs
again to get rid of all the files that you've found with the previous calls.
The problem with calling grep
directly from the find
command with the -exec
argument is that grep
then gets invoked once per file rather than once for a whole batch of files as xargs
does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:
rm $(some command that produces lots of filenames)
It's always better to pass it to xargs
as this knows the maximum command-line limits and will call rm
multiple times each time with as many arguments as it can.
Note that this solution would have been simpler without the need to cope with files containing white space and new lines.
Alternatively
grep -r -L -Z 'my string' . | xargs --null rm
will work too (and is shorter). The -r
argument to grep
causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ...
approach if you want to do some other tests on the files as well (such as age or permissions).
Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ
). But note also that find
does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.
Upvotes: 17
Reputation: 15029
EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to @ormaaj for pointing it out!
find . -type f | grep -v "exclude string" | xargs rm
Note: grep
pattern will match against full file path from current directory (see find . -type f
output)
Upvotes: 5
Reputation: 6577
GNU grep and bash.
grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done
Use a find
solution if portability is needed. This is slightly faster.
Upvotes: 5
Reputation: 14711
I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.
find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm
This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.
Upvotes: 1