Finding image present docx file using python

Question

how i can find image images present in document file, is there any module for this in python. I searched but of no use. this is how we can read from word file . code below give no information about images present in file

 from  docx import Document

 documnet=Document('new-file-name.docx')
 para=documnet.paragraphs
     for par in para:
         print par.text

NorthCat · Accepted Answer

Since .docx files are zip files, you can use zipfile module:

import zipfile

z = zipfile.ZipFile("1.docx")

#print list of valid attributes for ZipFile object
print dir(z)

#print all files in zip archive
all_files = z.namelist()
print all_files

#get all files in word/media/ directory
images = filter(lambda x: x.startswith('word/media/'), all_files)
print images

#open an image and save it
image1 = z.open('word/media/image1.jpeg').read()
f = open('image1.jpeg','wb')
f.write(image1)

#Extract file
z.extract('word/media/image1.jpeg', r'path_to_dir')

Finding image present docx file using python

Answers (2)

Related Questions