Silvio
Silvio

Reputation: 69

How to detect color from PDF Python

Is there any way, in Python, of automatically detect the colors in a certain area of a PDF and either translate them to RGB or compare them to the legend and then get the color?

Upvotes: 5

Views: 8932

Answers (2)

黄雨伞
黄雨伞

Reputation: 1974

Felipe's approach didn't work for me, but I came up with this:

#!/usr/bin/env python
# -*- Encoding: UTF-8 -*-

import minecart

colors = set()

with open("file.pdf", "rb") as file:
    document = minecart.Document(file)
    page = document.get_page(0)
    for shape in page.shapes:
        if shape.fill:
            colors.add(shape.fill.color.as_rgb())

for color in colors: print color

This will print a neat list of all unique RGB values in the first page of your document (you could extend it to all pages, of course).

Upvotes: 4

Felipe
Felipe

Reputation: 3149

Depending on where you want to extract the information from, you can use minecart. It has really robust support for colors and allows easy conversion to RGB. Though you can't input a coordinate and get the color value there, if you are trying to get color information from a shape you could do something like the following:

import minecart
doc = minecart.Document(open("my-doc.pdf", "rb"))
page = doc.get_page(0)
BOX = (.5 * 72,  # left bounding box edge
       9 * 72,   # bottom bounding box edge
       1 * 72,   # right bounding box edge
       10 * 72)  # top bounding box edge
for shape in page.shapes:
    if shape.check_in_bbox(BOX):
        r, g, b = shape.fill.color.as_rgb()
        # do stuff with r, g, b

[Disclaimer: I'm the author of minecart]

Upvotes: 2

Related Questions