Reputation: 93
I ran the following
grep -irln "mold"
against a directory using cygwin on my Windows 7 Enterprise machine at work and it found a match in a particular pdf file. However, when I open the file via adobe or chrome and do control+f and search for mold, no results are found. This PDF has been through an OCR service. So I guess my question is how is it possible for grep to return results but then do a ctrl+f on the open file and get nothing?
Upvotes: 0
Views: 44
Reputation: 8496
It seems you are misunderstanding that grep looks for every occurance in a file and that a PDF file is a written in markup language to render the graphical appearence of text and images.
Using a very simple text file as example
$ cat << EOF > example.txt
> one dog
> two cats
> three chickens
> EOF
we convert it to postscript and than to pdf
$ a2ps example.txt -o example.ps
[example.txt (plain): 1 page on 1 sheet]
[Total: 1 page on 1 sheet] saved into the file `example.ps'
$ ps2pdf example.ps example.pdf
so we have 3 files with the same text, but the postscript and the PDF have their specific markup languange around the original text.
Now if we ask grep to look for the chicken
$ grep chicken example.*
example.ps:(three chickens) N
example.txt:three chickens
you can see that the PDF file does not contain chicken
as plain text. This is because the original text is compressed inside the PDF.
Your result of mold
is a false positive. The text inside the PDF is compressed and grep can not find it.
Upvotes: 1