Stefan Fabian
Stefan Fabian

Reputation: 510

Determine whether visual object is visible in a PDF using XFINIUM.PDF

While extracting the bounds of visual objects in a pdf using XFINIUM.PDF, I noticed that some of the visual objects weren't actually visible.
However, I couldn't find any property that I could use to determine whether it is actually visible or not.
Contrary to this, I don't care about text that is invisible because it is behind an image.

Here's an example of what I mean. For some reason this pdf contains a lot of text that is not actually visible on the page. Part of it is a duplicate of actually visible text and the other part might be from the next page. The black rectangle on the pdf is the bounding box of the text selected on the top right.

Example image taken from XFINIUM.PDF Inspector

All of the invisible text is a subelement of a Form XObject but I can't just ignore the XObject because the figure at the top is also a subelement of the XObject including the axis descriptions and I don't want to exclude those.

I have noticed that the visible text is in the PdfGrayColorSpace (including the axis descriptions) and the invisible text is in the PdfIccColorSpace but I assume just ignoring all IccColorSpace stuff would fail miserably with some other pdfs. I've tried converting it to RGB but it converts to (0, 0, 0) which is obviously not helpful.
Any idea how I can determine whether the visual object is visible or not?

The pdf is available here and the example is on page 9.

Upvotes: 1

Views: 171

Answers (1)

mkl
mkl

Reputation: 95938

You already have determined that the invisible text is in a Form XObject. The reason why it is not visible is that it is outside the bounds of that XObject:

339 0 obj
<<
  /Type /XObject
  /Subtype /Form
  /BBox [ 253.4743 617.9332 447.7891 726.5818 ]
  ...

If you increase the bounding box to the dimensions of a full page

  ...
  /BBox [ 0 0 612 792 ]
  ...

page 9 looks like this:

Page 9 with enlarged form XObject bounds

(Apparently the XObject contains a former version of the page. Probably the original file of the figure got lost and since that loss a copy of that page from an earlier version was used.)


Thus:

Any idea how I can determine whether the visual object is visible or not?

Test the Form XObject contents whether they are inside its BBox.

Upvotes: 1

Related Questions