Reputation: 1402
I am trying to read a .pptx file using python-pptx
. I managed to get all the content except the image from the presentation. Below is the code i used to identify images other than textframes in the presentation. After identifying i am getting the auto_shape_type
as RECTANGLE (1)
but nothing about the image.
from pptx import Presentation
from pptx.shapes.picture import Picture
def read_ppt(file):
prs = Presentation(file)
for slide_no, slide in enumerate(prs.slides):
for shape in slide.shapes:
if not shape.has_text_frame:
print(shape.auto_shape_type)
Any help on understanding this problem appreciated. Alternative options are also welcome.
Upvotes: 3
Views: 2257
Reputation: 53663
try querying the shape.shape_type
. by default, the auto_shape_type
returns rectangle as you've observed, though pictures can be inserted into and masked by other shapes as well.
Note the default value for a newly-inserted picture is
MSO_AUTO_SHAPE_TYPE.RECTANGLE
, which performs no cropping because the extents of the rectangle exactly correspond to the extents of the picture.
the shape_type
should return:
Unique integer identifying the type of this shape, unconditionally
MSO_SHAPE_TYPE.PICTURE
in this case.
You can extract the image content to a file by using its blob
property and writing out the binary:
from pptx import Presentation
pres = Presentation('ppt_image.pptx')
slide = pres.slides[0]
shape = slide.shapes[0]
image = shape.image
blob = image.blob
ext = image.ext
with open(f'image.{ext}', 'wb') as file:
file.write(blob)
Upvotes: 2