andi
andi

Reputation: 21

count bw and color pages from PDF in PHP

Does someone knows a workable solution for the following:

A PDF file needs to be checked if it contains colored pages. Need to know total pages in black/white and total pages with some colors on it (images or colored text).

Thanks for any ideas!

More info #1: We expect mainly plain "word" like created PDFs with some images and some colored text elements/boxes. Full scanned pages are not expected in this process.

Upvotes: 2

Views: 2095

Answers (2)

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90213

See this answer for a Ghostscript-based tool:

It uses the new inkcov device to determine the distribution of C (cyan), Y (yellow), M (magenta) and K (black) components (ink coverage) of each page. You'll need a Ghostscript version of 9.05 or newer.

Example command line:

gs -q  -o - -sDEVICE=inkcov temp.pdf
 0.00000  0.00000  0.00000  0.02230 CMYK OK
 0.00000  0.00000  0.00000  0.02360 CMYK OK
 0.00000  0.00000  0.00000  0.02525 CMYK OK
 0.00000  0.00000  0.00000  0.01982 CMYK OK

Each page with zeros only for C, M and Y will be black/white only.

Upvotes: 1

Ritsaert Hornstra
Ritsaert Hornstra

Reputation: 5111

Probably the easiest way to do that is to use a tool to render the PDF to a set of images and then use a small program to determine if the colors used in those images are grayscale only or not.

The second step can be performed by loading each and every image and scanning the pixels. For scanned pages: determining if something is grayscale is not trivial since you need to consider the whitepoint, blackpoint for each page and possibly lighting coloring of edges etc etc. I once created a tool te determine if something is just text or b/w lineart by obtaining the the 2D historgram of Abs( R- G ) and Abs( R - B ), plotting a straight line and check if that line and the regression constant where within some predefined ranges.

Upvotes: 0

Related Questions