Naveen
Naveen

Reputation:

How can I get the number of color pages in a PDF file using C#?

Given a PDF file with color and black & white pages, is there any way with C# to find out among the given pages which are color and which are black & white?

Upvotes: 6

Views: 4612

Answers (5)

HABJAN
HABJAN

Reputation: 9338

There is a solution.

You can parse each page content bytes and look for color operators like 'rg, RG, k, K, sc, SC, scn, SCN' and read out all the color values and color spaces defined in each page.

Take a look at this example: http://habjan.blogspot.com/2013/09/proof-of-concept-converting-pdf-files.html

It implements / parses all color operators and I think it will be a good start point and reference to help you code what you need.

Upvotes: 0

Jay
Jay

Reputation: 1889

Convert each page into bitmap image and then look through each pixel in the image you would be able to catch colours and then differentiate color pages.

refer this Post for more details.

Note: If your are detecting this colors for printing sake, then you have to detect CMYK colors not RGB, CMYK is the printer standard color mode, and RGB is a display color mode.

Upvotes: 0

Douglas Anderson
Douglas Anderson

Reputation: 4690

Check out:

PDF-Analyser

I use his tools for text extraction and pdf analysis. Very inexpensive, royalty free, and work well. I think GetPDFColourStyle as part of the PDFLayoutPlus library should do the trick.

Upvotes: 0

Rowan
Rowan

Reputation: 2430

My recommendation is to render each page to an image and then check each pixel for RGB values not equal to each other. If R=G=B for each pixel then it's a grayscale image.

You could then perform actions (such as extracting a page to another document or printing the page) on the pages based on whether they are color pages or black and white pages, etc.

This can be achieved by using my companies PDF developer library, Quick PDF, or potentially by one of the open source PDF libraries that Kenneth suggested.

Upvotes: 1

jmucchiello
jmucchiello

Reputation: 18984

Short of parsing all the postscript content, probably not. There's no flag on a PDF page that says it is or is not b&w or color. So you'd have to check the color of every element placed on the page to figure out if it was color or not. I'm not sure what libraries exist for reading PDFs on C# but you would need one that will read all the elements.

Similarly, any images you have on the page would need to be checked for color and that is not simple. Color image formats can hold b&w images.

Upvotes: 0

Related Questions