Reputation: 21
Does someone knows a workable solution for the following:
A PDF file needs to be checked if it contains colored pages. Need to know total pages in black/white and total pages with some colors on it (images or colored text).
Thanks for any ideas!
More info #1: We expect mainly plain "word" like created PDFs with some images and some colored text elements/boxes. Full scanned pages are not expected in this process.
Upvotes: 2
Views: 2095
Reputation: 90213
See this answer for a Ghostscript-based tool:
It uses the new inkcov
device to determine the distribution of C (cyan), Y (yellow), M (magenta) and K (black) components (ink coverage) of each page. You'll need a Ghostscript version of 9.05 or newer.
Example command line:
gs -q -o - -sDEVICE=inkcov temp.pdf
0.00000 0.00000 0.00000 0.02230 CMYK OK
0.00000 0.00000 0.00000 0.02360 CMYK OK
0.00000 0.00000 0.00000 0.02525 CMYK OK
0.00000 0.00000 0.00000 0.01982 CMYK OK
Each page with zeros only for C, M and Y will be black/white only.
Upvotes: 1
Reputation: 5111
Probably the easiest way to do that is to use a tool to render the PDF to a set of images and then use a small program to determine if the colors used in those images are grayscale only or not.
The second step can be performed by loading each and every image and scanning the pixels. For scanned pages: determining if something is grayscale is not trivial since you need to consider the whitepoint, blackpoint for each page and possibly lighting coloring of edges etc etc. I once created a tool te determine if something is just text or b/w lineart by obtaining the the 2D historgram of Abs( R- G ) and Abs( R - B ), plotting a straight line and check if that line and the regression constant where within some predefined ranges.
Upvotes: 0