Extract PDF OLE Object in MS Word using win32com python

Question

This is my very first question here....

I have a lot of MSWord files with 1 or more PDF inserted as objects, i need to process all de word files and extract the pdfs to save them as pdf files, leaving de MS word file just like i found it. Until now i have this code to test it in one file:

import win32com.client as win32
word = win32.Dispatch('Word.Application')
word.Application.Visible = False
doc1 = word.Documents.Open('C:\word_merge\docx_con_pdfs.docx')
for s in doc1.InlineShapes:
    if s.OLEFormat.ClassType == 'AcroExch.Document.DC':
       s.OLEFormat.DoVerb()
_ = input("Hit Enter to Quit")
doc1.Close()
word.Application.Quit()

I know this work because the s.OLEFormat.DoVerb() effectivly opens the files in Adobe Reader and kept them open until "Hit Enter" moment, when are closed with the word file.

Is in this point when i need to replace DoVerb() with some code that save the OLE Object into a PDF file.

In this point s contains the file i need, but i cant find the way to save it as file instead of only open it.

please help me, i have read articles many hours by now and didn't find the answer.

Extract PDF OLE Object in MS Word using win32com python

Answers (1)

Related Questions