Zhro
Zhro

Reputation: 2614

Can logical operators be used with find and xargs?

I have a directory of about 5000 files of which some were erroneously written with a syntax error. I am using the following code to identify which files have the error:

ls -1 | while read a; do grep -q '^- ' $a || echo $a; done

I initially tried to use a combination of find and xargs but I couldn't figure out how to add the boolean logic I needed.

My use case is not I/O bound and completes fast enough. But I was curious to see if this same operation be done without relying on a bash loop. Although comfortable with Bash, I have a tendency to rely heavily on piping into loops which often leads to mind numbingly slow performance.

Upvotes: 4

Views: 698

Answers (3)

mklement0
mklement0

Reputation: 438143

Use of grep alone is sufficient:

grep -d skip -L '^- ' *

Note: Unlike find, this will not automatically include hidden files.
To search recursively, use grep -L '^- ' -R . instead (although -R is not POSIX-compliant, it works with both GNU and BSD/macOS grep).

-L, as described in Jamil Said's helpful answer, prints the path (as specified) of each input file that does not contain the search term.

-d skip skips directories (while option -d is not POSIX-compliant, it is supported by both GNU and BSD/macOS grep).


Caveat: As hek2mgl points out in a comment, the command line that results after filename expansion of * may be too long, resulting in an error such as /usr/bin/grep: Argument list too long.
(By contrast, if you make grep search recursively with -R ., you won't face this problem.)

The max. length is platform-specific, and can be queried with getconf ARG_MAX, though note that the actual limit is lower than that, depending on the size of your environment - see this article.

In practice, 5000 files will likely not be a problem, even on platforms with a relatively low max. length, such as macOS - unless you have exceptionally long filenames and/or your globbing pattern has a lengthy path component[1] .
Recent Linux versions have a much higher limit.

If you do hit the limit and must work around it, use xargs as follows:

printf '%s\0' * | xargs -0 grep -d skip -L '^- '

Note that while -0 to read NUL-terminated input is not POSIX-compliant, it is supported by both GNU and BSD/macOS xargs.

If the input filenames indeed don't fit on a single command line, xargs will partition the input in a way that results in the fewest grep invocations necessary to process all of them.


[1] macOS 10.12 has a limit of 262,144 bytes (256 KB); if we conservatively assume that, after deducting the size of the environment and the fixed part of the command line, we get 250,000 bytes for our filename list, this gives us 250000 / 5000 == 50 bytes per filename + space (the list separator), so that each filename is allowed to be up to 49 bytes long.
By contrast, Ubuntu 16.04's limit is 8 x times higher: 2,097,152 bytes (2 MB).

Upvotes: 2

Jamil Said
Jamil Said

Reputation: 2093

Here is another way to do this, using grep -L:

find -maxdepth 1 -type f -exec grep -L '^- ' {} \;

The code above would list all files on the directory which do NOT contain a line starting with dash + space - in their contents.

To make the code above recursive (that is, to extend the search to all subdirectories), just remove the -maxdepth 1 part out.

From man grep about option -L:

-L, --files-without-match Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.

Upvotes: 2

hek2mgl
hek2mgl

Reputation: 158040

You can use boolean logic with find:

find -maxdepth 1 -type f \( -exec grep -q '^- ' {} \; -o -print \)

The -o option is a logical OR. If the command executed by -exec will return a non-zero return value -print will print the filename.

Upvotes: 3

Related Questions