chimaira
chimaira

Reputation: 181

Combine find, grep and xargs with printf

I have a find command combined with exec grep and a printf option :

find  -L /home/blast/dirtest -maxdepth 3  **-exec grep -q  "pattern" {} \;**  -printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null

Result :

f/#/2018-01-01 10:00:00/#/191/#/filee.xml/#//#//home/blast/dirtest/01/05

I need the printf to get all the desired file informations at once (date, type size etc)

The above command works fine. But the exec option is too slow comparing to xargs.

I tryed to do the same with xarg but I did not succeed. Any Idea on how to acheive that ? using the xargs command keeping the desired printf or similar .

Thanks

Upvotes: 0

Views: 1807

Answers (3)

jhnc
jhnc

Reputation: 16662

Your code is:

find  -L /home/blast/dirtest -maxdepth 3 \
    -exec grep -q  "pattern" {} \; \
    -printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null

This invokes a new grep process for each file.

If you are using GNU utilities, you can reduce the number of grep processes by something like:

(
    format=\''%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n'\'

    find -L /home/blast/dirtest -maxdepth 3 -print0 |\
    xargs -0 grep -l -Z "pattern" |\
    xargs -0 sh -c 'find "$@" -printf '"$format" --
) 2>/dev/null
  • for clarity, store the formatstring in a variable
  • use -print0 / -0 / -Z options to enable null-delimited data
  • generate initial filelist with find
  • filter on "pattern" with grep (use of xargs minimises the number of times grep gets called)
  • feed the filtered filelist into another xargs to run a minimal number of find -printf
  • in second xargs, call a subshell so that extra arguments can be appended (find requires the paths to precede the operators)
  • dummy second argument (--) to the sh -c invocation prevents the first filename being lost due to assignment to $0

Upvotes: 3

chimaira
chimaira

Reputation: 181

I'v found an intresting thing about the -exec option. We could run the grep once using the exec with the plus-sign (+)

-exec command {} +
              This variant of the -exec option runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the  total
              number  of  invocations  of  the  command  will be much less than the number of matched files.  The command line is built in much the same way that xargs builds its command
              lines.  Only one instance of ’{}’ is allowed within the command.  The command is executed in the starting directory.

That means if I change this :

-exec grep -l 'pattern'  {} \;

By this ( replace the semicolon with the plus signe ):

-exec grep -l 'pattern'  {} \+

Will improve the performance significantly.

Then I can pipe only one xargs for the format printing needs only.

Upvotes: 0

root
root

Reputation: 6048

To do it exactly how you want:

find  -L /home/blast/dirtest/ -maxdepth 3 \
    -printf '%p@%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' \
    > tmp.out
cut -d@ -f1 tmp.out \
    | xargs grep -l "pattern" 2>/dev/null \
    | sed 's/^/^/; s/$/@/' \
    | grep -f /dev/stdin tmp.out \
    | sed 's/^.*@//'

This operates under the assumption that you have no character @ in your file names.

What it does is avoid the grep at first and just dump all the files with the requested metadata to a temporary file.

But it also prefixes each line with the full path (%p@).

Then we extract (cut) the full paths out of this list and list the files which contains the pattern (xargs grep).

We then use sed to prefix each such file name with ^ and suffix it with @, which makes it a greppable pattern in our tmp.out file.

Then we use this pattern (grep -f /dev/stdin) to extract only those paths from the big list in tmp.out.

Now all that's left is to remove the artificial full path we prefixed using the last sed command.

Seeing how you used /home, there's a good chance you're on Linux, which, if you're willing to accept some output format changes, allows you to do it somewhat more elegantly:

find -L /home/blast/dirtest/ -maxdepth 3 \
    | xargs grep -l "pattern" 2>/dev/null \
    | xargs stat --printf '%F/#/%y/#/%s/#/%n\n'

The output of stat --printf is different from that of find -printf (and from that of MacOS' stat -f), but it's the same information.

Do note, however, that because you passed -L to find, and you're grepping the result:

  1. The results are limited to file types which can be grepped, so they will never be directories, links, etc..
  2. If you stumble upon a broken link, it will not be in the output because it cannot be grepped.

Upvotes: 1

Related Questions