Shibumi
Shibumi

Reputation: 715

PostScript code to un-hide hidden text in PDF

I have a PDF with some hidden text in it.

When I press [CTRL+a] I see the hidden text in my document viewer.

I can copy the text too and I can extract the text via pdftotext, but I can't recolorize the text so I can view the hidden text in the PDF viewer without pressing [CTRL+a].

So I had the idea, that I could use PostScript and change the color for the this text object.

But how can I determine what function sets the color or hides the text?

Upvotes: 2

Views: 1451

Answers (2)

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90263

You cannot use PostScript to achieve what you want. You need to resort to manually editing the PDF file...


There are basically three ways to "hide" text:

  • It could be white (or any color) text on white (or same color as text) background.
  • It could be covered by another object, say, a white area, or an image.
  • It could be using Text Rendering Mode 3 ("3 Tr").

The first two cases I'll not explain here, because they are rather unlikely. For the third case you could proceed like this:

  1. Use qpdf to unpack as many as possible compressed 'streams' inside the PDF, creating what qpdf calls the 'QDF mode' of a PDF:

    qpdf --qdf --object-streams=disable input.pdf uncompressed.pdf
    
  2. Open uncompressed.pdf in a good text editor, such as VIm.

  3. Search for the sequence 3 Tr.
    (Text rendering mode 3 is described in the PDF-1.7 specification as "Neither fill nor stroke text (invisible).")

  4. Change it to 1 Tr or 2 Tr and save the file.
    (Text rendering mode 1 is "stroke text", mode 2 is "Fill, then stroke text." Mode 1 will only show the outlines...)

  5. Re-compress the file:

    qpdf uncompressed.pdf input-modified.pdf
    
  6. Open the new file input-modified.pdf in your favourite PDF viewer. It should now show the "un-hidden" text.


Update

Having received a sample of a PDF file with "hidden" text from the OP (via private channels), I can confirm now that the hiding indeed is achieved by using white text color (RGB-white).

To make such text visible:

  1. Unpack the PDF, using qpdf --qdf --object-streams=disable in.pdf unpacked.pdf

  2. Search for all occurrences of 1 1 1 rg and 1 1 1 RG. These set the RGB colors to white (the first one non-stroking, the second one for stroking operations).

  3. Comments à la %%Contents for page N: in the QDF-version of the uncompressed PDF file will indicate for which page the color setting is valid. (Note, there may be multiple occurrences of the rg and RG operators, each one setting a different (or the same) color for the next drawing operation.)

  4. Now replace the white colors by black ones, by overwriting the found occurrences with 0 0 0 rg and 0 0 0 RG. Do this not all at once, but one after the other and observe what changes on the respective page after saving the changes. (You may want to avoid painting white text to black if it is on a black background already!)

Upvotes: 2

KenS
KenS

Reputation: 31159

Firstly, hidden text in PDF is done with a text rendering mode, not a colour. Text rendering mode 3 is 'neither stroke nor fill'. So changing the colour won't help you if this is how the text is drawn. Of course we can't tell if this is how the text has been drawn (but I suspect it is) because you haven't made the PDF file publicly available. In almost all cases if you want to discuss a particular file the best thing to do is make it public.

Secondly, you can't use PostScript to change a PDF file (well, you could write a PostScript program to interpret the PDF file, but that would be hard...)

Upvotes: 1

Related Questions