SanthoshSolomon
SanthoshSolomon

Reputation: 1402

Get a picture - Python-pptx

I am trying to read a .pptx file using python-pptx. I managed to get all the content except the image from the presentation. Below is the code i used to identify images other than textframes in the presentation. After identifying i am getting the auto_shape_type as RECTANGLE (1) but nothing about the image.

from pptx import Presentation
from pptx.shapes.picture import Picture

def read_ppt(file):
    prs = Presentation(file)
    for slide_no, slide in enumerate(prs.slides):
        for shape in slide.shapes:
            if not shape.has_text_frame:
                print(shape.auto_shape_type)

Any help on understanding this problem appreciated. Alternative options are also welcome.

Upvotes: 3

Views: 2257

Answers (1)

David Zemens
David Zemens

Reputation: 53663

try querying the shape.shape_type. by default, the auto_shape_type returns rectangle as you've observed, though pictures can be inserted into and masked by other shapes as well.

Note the default value for a newly-inserted picture is MSO_AUTO_SHAPE_TYPE.RECTANGLE, which performs no cropping because the extents of the rectangle exactly correspond to the extents of the picture.

the shape_type should return:

Unique integer identifying the type of this shape, unconditionally MSO_SHAPE_TYPE.PICTURE in this case.

You can extract the image content to a file by using its blob property and writing out the binary:

from pptx import Presentation
pres = Presentation('ppt_image.pptx')
slide = pres.slides[0]
shape = slide.shapes[0]
image = shape.image
blob = image.blob
ext = image.ext
with open(f'image.{ext}', 'wb') as file:
    file.write(blob)

Upvotes: 2

Related Questions