Reputation: 45
The PDF has some text written in black color.When I see this document in the viewer I am unable to see that text(INSERT) which is overlapped by some other text. If I select I can see that text. You can search for the text "INSERT" in the document.
I can't see anything that makes the text hidden. Can anybody explain me what's making the text invisible in the viewer?
Upvotes: 0
Views: 905
Reputation: 17169
The trick used here to hide some of the text is called clipping.
Each block of text or graphics in the PDF document may be accompanied with a rectangle. After the objects within the block are rendered only the part that lies within the clipping rectangle is shown on the page.
To see all the text that is contained in the document you could use any of the utilities that extract text out of PDF documents such as pdftotext
that is part of the Poppler toolkit.
$ pdftotext ../x.pdf - | grep INSERT
[INSERT TABLE TITLE HERE]
Source: [INSERT SOURCE TEXT HERE]
[INSERT Group
Source: [INSERT SOURCE TEXT HERE]
[INSERT TABLE TITLE HERE]
Source: [INSERT SOURCE TEXT HERE]
[INSERT
Source: [INSERT SOURCE TEXT HERE]
This shows that there are four tables hidden in this document. To examine the document structure and see the clipping rectangles you should use one of the PDF APIs, such as iText or Poppler.
The rest of the answer is based on SVG notation instead of PDF. As PDF operators make a low level language that is difficult to read and write directly. For the curious minds an example that uses PDF markup is in the next section.
SVG is another format for vector graphics. Any PDF document can be converted to SVG preserving most of the features to turn it into a sort of human readable form.
Below is a small fragment of your document converted to SVG using Inkscape. The final rendering looks like
And the document itself is
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" width="744" height="1052" viewBox="0 0 744 1052" version="1.1">
<defs>
<clipPath id="clipPath37876" clipPathUnits="userSpaceOnUse">
<path style="clip-rule:evenodd" d="m 170.12,325 382.7,0 0,21.02 -382.7,0 0,-21.02 z"/>
</clipPath>
<clipPath id="clipPath44260" clipPathUnits="userSpaceOnUse">
<path style="clip-rule:evenodd" d="m 170.12,315 382.7,0 0,16.02 -382.7,0 0,-16.02 z"/>
</clipPath>
</defs>
<g clip-path="url(#clipPath37876)" id="g37874">
<text style="font-weight:bold;font-size:9px;font-family:Arial;fill:#000000;fill-opacity:1" transform="matrix(1,0,0,1,170.12,335)">
<tspan>[INSERT TABLE TITLE HERE]</tspan>
</text>
</g>
<g clip-path="url(#clipPath44260)" id="g44258">
<text style="font-weight:bold;font-size:9px;font-family:Arial;fill:#000000;fill-opacity:1" transform="matrix(1,0,0,1,170.12,325)">
<tspan>December factory</tspan>
</text>
<text style="font-weight:bold;font-size:9px;font-family:Arial;fill:#000000;fill-opacity:1" transform="matrix(1,0,0,1,248.6,325)">
<tspan>shipments</tspan>
</text>
<text style="font-weight:bold;font-size:9px;font-family:Arial;fill:#000000;fill-opacity:1" transform="matrix(1,0,0,1,296.18,325)">
<tspan>summary</tspan>
</text>
</g>
<text style="font-weight:bold;font-size:12px;font-family:Arial;fill:#db0011;fill-opacity:1" transform="matrix(1,0,0,1,170.12,443.06)">
<tspan>Valuation and risks</tspan>
</text>
</svg>
Here clipping rectangle defined as
<clipPath id="clipPath37876" clipPathUnits="userSpaceOnUse">
<path style="clip-rule:evenodd" d="m 170.12,340 382.7,0 0,21.02 -382.7,0 0,-21.02 z"/>
</clipPath>
Is defined in such a way that the text in the block to which this rectangle is associated is completely rendered outside the bounds of the rectangle. If you replace the rectangle starting point 170.12,340
with 170.12,325
the document would be rendered with all text visible.
The following code uses PDF markup operators to print a string of text within a clipping rectangle placed so that the text fits inside.
stream
0 0 200 20 re
W n
q
BT
/F1_0 18 Tf
0 5 Td
([INSERT TEXT HERE]) Tj
ET
Q
endstream
Notice that we first define a clipping rectangle 0 0 200 20
which is 200 points wide and 20 points high. Then we put some text starting at location (0 5)
using an 18 point font.
The outline of the clipping rectangle is shown in blue.
Now if we replace the clipping rectangle with 0 10 200 20
the resulting text would look like
Only the part of the text that fits inside the clipping rectangle is shown.
With the following bit of PDF we can see how new text can appear above the text that was clipped.
stream
q
0 14 200 20 re
W n
q
BT
/F1 18 Tf
0 5 Td
([INSERT TEXT HERE]) Tj
ET
Q
Q
100 0 80 14 re
W n
q
BT
/F1 14 Tf
1.0 0.0 0.2 rg
1.0 0.0 0.2 RG
100 5 Td
(new text) Tj
ET
Q
endstream
Here the clipping applied to the black text does not apply to the red block. As in the previous examples the outline of the clipping rectangle for the black text is shown in blue.
Upvotes: 1