Adi Peer
Adi Peer

Reputation: 61

Ghostscript: get CMYK values for rendering from PDF

I need to get the CMYK values used for rendering from the PDF.

I think they are the values range 0 - 1.0 under the C1 key.

Does anyone know how to get them ?

Upvotes: 1

Views: 671

Answers (1)

KenS
KenS

Reputation: 31141

The CMYK values are nothing to do with the 'C1' key. There may be a colorspace defined as /C1, but it will not contain CMYK values.

Any object may be defined in a variety of colour spaces (Gray, RGB, CMYK, sRGB, Separation, DeviceN, NChannel, ICC and some special spaces), for those spaces which are not a device space (ie not Gray, RGB or CMYK) the colour is first converted into one of the device spaces. There are then rules defined in the PDF reference on how the device spaces are converted between themselves.

The actual colour components of an object will be defined in the content stream of the object (for vector objects in a page or Form context) or in the binary data (for images), or calculated from a function (Shading dictionaries).

In order to find any of these you will need to read the PDF file, decompressing streams as required, locate the object you want the information for and then determine the current colour space. Then you can convert the colour components from whatever colour space the object is defined in into CMYK.

Perhaps if you explained what your actual goal is it might be possible to be more helpful.

[UPDATE]

You could simply use Ghostscript to create a new, Grayscale, PDF by setting ColorConversionStrategy=Gray.

This has the advantage of working with all elements of the PDF not only images.

You do realise that a PDF file does not normally consist solely of a raster image ? There can be text, linework, shadings, and transparency groups can also be defined as operating in a given colour space. This is not a simple task.

If you are really only dealing with images then the ColorSpace is defined in the image dictionary (it may be an indirect reference). You will have to parse the PDF file (potentially decompressing it) to find the Color space definition. The sample values for each component are then given by the image data. These will range from 0-65535 (depending on the BPC, 1, 2, 4, 8 or 16 in the image dictionary) and you will have to apply the Decode array to map the values into a range suitable for the colour space.

If you then want to convert to gray scale, then you will have to apply a conversion to Gray. Complex spaces will include a method to map to a device space, and the conversion between device spaces is covered in the PDF reference manual. For ICCBased spaces you will need an ICC colour management engine, you might like to consider LCMS, or you could write your own.

Upvotes: 2

Related Questions