Reputation: 2614
I have a directory of about 5000 files of which some were erroneously written with a syntax error. I am using the following code to identify which files have the error:
ls -1 | while read a; do grep -q '^- ' $a || echo $a; done
I initially tried to use a combination of find
and xargs
but I couldn't figure out how to add the boolean logic I needed.
My use case is not I/O bound and completes fast enough. But I was curious to see if this same operation be done without relying on a bash loop. Although comfortable with Bash, I have a tendency to rely heavily on piping into loops which often leads to mind numbingly slow performance.
Upvotes: 4
Views: 698
Reputation: 438143
Use of grep
alone is sufficient:
grep -d skip -L '^- ' *
Note: Unlike find
, this will not automatically include hidden files.
To search recursively, use grep -L '^- ' -R .
instead (although -R
is not POSIX-compliant, it works with both GNU and BSD/macOS grep
).
-L
, as described in Jamil Said's helpful answer, prints the path (as specified) of each input file that does not contain the search term.
-d skip
skips directories (while option -d
is not POSIX-compliant, it is supported by both GNU and BSD/macOS grep
).
Caveat: As hek2mgl points out in a comment, the command line that results after filename expansion of *
may be too long, resulting in an error such as /usr/bin/grep: Argument list too long
.
(By contrast, if you make grep
search recursively with -R .
, you won't face this problem.)
The max. length is platform-specific, and can be queried with getconf ARG_MAX
, though note that the actual limit is lower than that, depending on the size of your environment - see this article.
In practice, 5000 files will likely not be a problem, even on platforms with a relatively low max. length, such as macOS - unless you have exceptionally long filenames and/or your globbing pattern has a lengthy path component[1]
.
Recent Linux versions have a much higher limit.
If you do hit the limit and must work around it, use xargs
as follows:
printf '%s\0' * | xargs -0 grep -d skip -L '^- '
Note that while -0
to read NUL-terminated input is not POSIX-compliant, it is supported by both GNU and BSD/macOS xargs
.
If the input filenames indeed don't fit on a single command line, xargs
will partition the input in a way that results in the fewest grep
invocations necessary to process all of them.
[1] macOS 10.12 has a limit of 262,144
bytes (256 KB); if we conservatively assume that, after deducting the size of the environment and the fixed part of the command line, we get 250,000
bytes for our filename list, this gives us 250000 / 5000 == 50
bytes per filename + space (the list separator), so that each filename is allowed to be up to 49
bytes long.
By contrast, Ubuntu 16.04's limit is 8 x times higher: 2,097,152
bytes (2 MB).
Upvotes: 2
Reputation: 2093
Here is another way to do this, using grep -L
:
find -maxdepth 1 -type f -exec grep -L '^- ' {} \;
The code above would list all files on the directory which do NOT contain a line starting with dash + space -
in their contents.
To make the code above recursive (that is, to extend the search to all subdirectories), just remove the -maxdepth 1
part out.
From man grep
about option -L
:
-L, --files-without-match Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.
Upvotes: 2
Reputation: 158040
You can use boolean logic with find
:
find -maxdepth 1 -type f \( -exec grep -q '^- ' {} \; -o -print \)
The -o
option is a logical OR. If the command executed by -exec
will return a non-zero return value -print
will print the filename.
Upvotes: 3