user3818875
user3818875

Reputation:

Finding image present docx file using python

how i can find image images present in document file, is there any module for this in python. I searched but of no use. this is how we can read from word file . code below give no information about images present in file

 from  docx import Document

 documnet=Document('new-file-name.docx')
 para=documnet.paragraphs
     for par in para:
         print par.text

Upvotes: 5

Views: 10025

Answers (2)

thalescr
thalescr

Reputation: 131

You'll have to first extract all image files as .zip, look for image elements in your XML code and relate each image to it's rId.

import os
import docx
import docx2txt

# Extract the images to img_folder/
docx2txt.process('document.docx', 'img_folder/')

# Open you .docx document
doc = docx.Document('document.docx')

# Save all 'rId:filenames' relationships in an dictionary named rels
rels = {}
for r in doc.part.rels.values():
    if isinstance(r._target, docx.parts.image.ImagePart):
        rels[r.rId] = os.path.basename(r._target.partname)

# Then process your text
for paragraph in doc.paragraphs:
    # If you find an image
    if 'Graphic' in paragraph._p.xml:
        # Get the rId of the image
        for rId in rels:
            if rId in paragraph._p.xml:
                # Your image will be in os.path.join(img_path, rels[rId])
    else:
        # It's not an image

GitHub Repository Link: django-docx-import

Upvotes: 13

NorthCat
NorthCat

Reputation: 9937

Since .docx files are zip files, you can use zipfile module:

import zipfile

z = zipfile.ZipFile("1.docx")

#print list of valid attributes for ZipFile object
print dir(z)

#print all files in zip archive
all_files = z.namelist()
print all_files

#get all files in word/media/ directory
images = filter(lambda x: x.startswith('word/media/'), all_files)
print images

#open an image and save it
image1 = z.open('word/media/image1.jpeg').read()
f = open('image1.jpeg','wb')
f.write(image1)

#Extract file
z.extract('word/media/image1.jpeg', r'path_to_dir')

Upvotes: 15

Related Questions