Reputation: 181
I have a find command combined with exec grep and a printf option :
find -L /home/blast/dirtest -maxdepth 3 **-exec grep -q "pattern" {} \;** -printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
Result :
f/#/2018-01-01 10:00:00/#/191/#/filee.xml/#//#//home/blast/dirtest/01/05
I need the printf to get all the desired file informations at once (date, type size etc)
The above command works fine. But the exec option is too slow comparing to xargs.
I tryed to do the same with xarg but I did not succeed. Any Idea on how to acheive that ? using the xargs command keeping the desired printf or similar .
Thanks
Upvotes: 0
Views: 1807
Reputation: 16662
Your code is:
find -L /home/blast/dirtest -maxdepth 3 \
-exec grep -q "pattern" {} \; \
-printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
This invokes a new grep
process for each file.
If you are using GNU utilities, you can reduce the number of grep
processes by something like:
(
format=\''%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n'\'
find -L /home/blast/dirtest -maxdepth 3 -print0 |\
xargs -0 grep -l -Z "pattern" |\
xargs -0 sh -c 'find "$@" -printf '"$format" --
) 2>/dev/null
-print0
/ -0
/ -Z
options to enable null-delimited datafind
"pattern"
with grep
(use of xargs
minimises the number of times grep
gets called)xargs
to run a minimal number of find -printf
xargs
, call a subshell so that extra arguments can be appended (find
requires the paths to precede the operators)--
) to the sh -c
invocation prevents the first filename being lost due to assignment to $0
Upvotes: 3
Reputation: 181
I'v found an intresting thing about the -exec option. We could run the grep once using the exec with the plus-sign (+)
-exec command {} +
This variant of the -exec option runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total
number of invocations of the command will be much less than the number of matched files. The command line is built in much the same way that xargs builds its command
lines. Only one instance of ’{}’ is allowed within the command. The command is executed in the starting directory.
That means if I change this :
-exec grep -l 'pattern' {} \;
By this ( replace the semicolon with the plus signe ):
-exec grep -l 'pattern' {} \+
Will improve the performance significantly.
Then I can pipe only one xargs for the format printing needs only.
Upvotes: 0
Reputation: 6048
To do it exactly how you want:
find -L /home/blast/dirtest/ -maxdepth 3 \
-printf '%p@%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' \
> tmp.out
cut -d@ -f1 tmp.out \
| xargs grep -l "pattern" 2>/dev/null \
| sed 's/^/^/; s/$/@/' \
| grep -f /dev/stdin tmp.out \
| sed 's/^.*@//'
This operates under the assumption that you have no character @
in your file names.
What it does is avoid the grep at first and just dump all the files with the requested metadata to a temporary file.
But it also prefixes each line with the full path (%p@
).
Then we extract (cut
) the full paths out of this list and list the files which contains the pattern (xargs grep
).
We then use sed
to prefix each such file name with ^
and suffix it with @
, which makes it a greppable pattern in our tmp.out
file.
Then we use this pattern (grep -f /dev/stdin
) to extract only those paths from the big list in tmp.out
.
Now all that's left is to remove the artificial full path we prefixed using the last sed
command.
Seeing how you used /home
, there's a good chance you're on Linux, which, if you're willing to accept some output format changes, allows you to do it somewhat more elegantly:
find -L /home/blast/dirtest/ -maxdepth 3 \
| xargs grep -l "pattern" 2>/dev/null \
| xargs stat --printf '%F/#/%y/#/%s/#/%n\n'
The output of stat --printf
is different from that of find -printf
(and from that of MacOS' stat -f
), but it's the same information.
Do note, however, that because you passed -L
to find, and you're grepping the result:
Upvotes: 1