Reputation: 5335
I need to find the reports (.docx files), read them with docx2txt
, find the second match of "passed" (excluding "not passed") and save these filenames to text file. Here is what I tried:
OIFS="$IFS"
IFS=$'\n'
for f in $(find . -wholename '*_done/(*Report*.docx' |grep -v appendix)
do
docx2txt "$f" - | (grep -q -m2 passed || grep -q -v "not passed") || echo $f >> failed
done
IFS="$OIFS"
But this script gives me an empty file. If I replace ||
to &&
before echo
, all filenames are stored into the file. grep
works fine if it is not in the script, as well as docx2txt
. What am I doing wrong here?
Upvotes: 0
Views: 211
Reputation: 27205
There are quite a lot problems with the grep commands.
grep -q
always exits successfully on the first match.
With -q
the -m2
has no effect. If there is one match grep
exits successfully. It does not check if there is a second match.
To check that there are (at least) two matches, count the matches and then use test
/[ ]
to check the number of found matches. If there is at most one passed
per line, grep -c
is sufficient. If there can be multiple matches per line, you need grep -o ... | wc -l
.
-q
and -v
together means: Is there at least one line that does not contain the pattern? When grep
finds such a line it exits successfully. The only way for this command to fail is an input in which every line contains not passed
(this includes the empty file).
Matching passed
but not not passed
is trickier than one might suspect. If there can be at most one passed
/not passed
per line, you can use grep -v 'not passed' | grep passed
. Otherwise you need a need negative lookbehind, which is only available in perl compatible regular expressions (PCRE).
In addition to that command | (grep ... || grep ...)
might not do what you expect. command
produces output only once. After the first grep
read some of this output, that read part is gone. The second grep
will then continue reading where the first grep
stopped.
BTW: for … in $(find … | grep -v …)
can be turned into a single, safe find
command using -not
and -exec
.
If each line contains at most one passed
/not passed
, use
find . -wholename '*_done/(*Report*.docx' -not -wholename '*appendix*' \
-exec sh -c '[ $(docx2txt "$0" - | grep -v "not passed" | grep -cm2 passed) = 2 ]' {} \; -print
If there can be multiple passed
/not passed
per line, you need GNU grep
or pcregrep
:
find . -wholename '*_done/(*Report*.docx' -not -wholename '*appendix*' \
-exec sh -c '[ $(docx2txt "$0" - | grep -Pom2 "(?<!not )passed" | wc -l) = 2 ]' {} \; -print
Upvotes: 2
Reputation: 311506
When you run into a problem like this, it's a good idea to remove as much code as possible. If we just take that one line with the multiple grep
statements, we can first verify that the current expression doesn't work:
$ echo passed | ((grep -q -m2 passed || grep -q -v "not passed") || echo failed
$ echo not passed | ((grep -q -m2 passed || grep -q -v "not passed") || echo failed
We can see that neither of these commands produces at any output.
Let's think carefully about the logic:
The ||
operator means "if the first command doesn't succeed, run the second command". So in both cases, the first grep succeeds (because both passed
and not passed
contain the phrase passed
). This means the second grep will never run, and it means that since the first command was successful, the entire grep ... || grep ...
command will be successful, and that means the final echo $f
will never run.
I was trying to think of a clever way to solve this, but it seems simplest if we make use of a temporary file:
OIFS="$IFS"
IFS=$'\n'
tmpfile=$(mktemp docXXXXXX)
trap "rm -f $tmpfile" EXIT
for f in $(find . -wholename '*_done/(*Report*.docx' |grep -v appendix)
do
docx2txt "$f" - | head -2 > $tmpfile
if grep -q passed $tmpfile && ! grep -q 'not passed' $tmpfile; then
echo $f >> failed
fi
done
IFS="$OIFS"
Upvotes: 2