Auro
Auro

Reputation: 135

Extracting images from presentation file

I am working on python-pptx package. For my code I need to extract all the images that are present inside the presentation file. Can anybody help me through this ?

Thanks in advance for help.

my code looks like this:

import pptx

prs = pptx.Presentation(filename)

for slide in prs.slides:
    for shape in slide.shapes:
        print(shape.shape_type)

while using shape_type it is showing PICTURE(13) present in the ppt. But i want the pictures extracted in the folder where the code is present.

Upvotes: 7

Views: 12727

Answers (5)

vladmihaisima
vladmihaisima

Reputation: 2248

While the solution by Jason Furtney (https://stackoverflow.com/a/53855752/83037) worked in more cases, it did not work for all the cases, so I added some code to deal with placeholders (this is not required for all placeholders, as I had some that worked, but I had some that did not). Used python-pptx==0.6.23

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
from pptx.shapes.placeholder import PlaceholderPicture
import sys


def write_image(shape, slide_idx, image_idx):
    image = shape.image
    # ---get image "file" contents---
    image_bytes = image.blob
    # ---make up a name for the file, e.g. 'image.jpg'---
    image_filename = f'slide{slide_idx}_image{image_idx:03d}.{image.ext}'
    image_idx += 1
    print(image_filename)
    with open(image_filename, 'wb') as f:
        f.write(image_bytes)
    return image_idx


def visitor(shape, slide_idx, image_idx):
    if shape.shape_type == MSO_SHAPE_TYPE.PLACEHOLDER:
        if isinstance(shape, PlaceholderPicture):
            image_idx, images = write_image(shape, slide_idx, image_idx)
    if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
        for s in shape.shapes:
            image_idx = visitor(s, slide_idx, image_idx)
    if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
        image_idx = write_image(shape, slide_idx, image_idx)
    return image_idx


def iter_picture_shapes(prs):
    img_count = 0
    for idx, slide in enumerate(prs.slides):
        for shape in slide.shapes:
            img_count = visitor(shape, idx, img_count)


filename = sys.argv[1]
iter_picture_shapes(Presentation(filename))

Upvotes: 1

chesney C.
chesney C.

Reputation: 303

A PowerPoint Presentation is just a zip file. Rename the .pptx to .zip, and you have the following:

enter image description here

Unzip the file, locate the media folder, and get the image files from media folder, in few lines code. Done. (No need to use python-pptx, its great lib to create pptx files)

Upvotes: 5

Jason Furtney
Jason Furtney

Reputation: 191

The solution by scanny did not work for me because I had image elements in group elements. This worked for me:

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE

n=0
def write_image(shape):
    global n
    image = shape.image
    # ---get image "file" contents---
    image_bytes = image.blob
    # ---make up a name for the file, e.g. 'image.jpg'---
    image_filename = 'image{:03d}.{}'.format(n, image.ext)
    n += 1
    print(image_filename)
    with open(image_filename, 'wb') as f:
        f.write(image_bytes)

def visitor(shape):
    if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
        for s in shape.shapes:
            visitor(s)
    if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
        write_image(shape)

def iter_picture_shapes(prs):
    for slide in prs.slides:
        for shape in slide.shapes:
            visitor(shape)

iter_picture_shapes(Presentation(filename))

Upvotes: 5

scanny
scanny

Reputation: 28883

A Picture (shape) object in python-pptx provides access to the image it displays:

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE

def iter_picture_shapes(prs):
    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
                yield shape

for picture in iter_picture_shapes(Presentation(filename)):
    image = picture.image
    # ---get image "file" contents---
    image_bytes = image.blob
    # ---make up a name for the file, e.g. 'image.jpg'---
    image_filename = 'image.%s' % image.ext
    with open(image_filename, 'wb') as f:
        f.write(image_bytes)

Generating a unique file name is left to you as an exercise. All the other bits you need are here.

More details on the Image object are available in the documentation here:
https://python-pptx.readthedocs.io/en/latest/api/image.html#image-objects

Upvotes: 14

Aravind
Aravind

Reputation: 543

Use this PPTExtractor repo for reference.

ppt = PPTExtractor("some/PowerPointFile")
# found images
len(ppt)
# image list
images = ppt.namelist()
# extract image
ppt.extract(images[0])

# save image with different name
ppt.extract(images[0], "nuevo-nombre.png")
# extract all images
ppt.extractall()

Save images in a diferent directory:

ppt.extract("image.png", path="/another/directory")
ppt.extractall(path="/another/directory")

Upvotes: 1

Related Questions