Matthew Ward
Matthew Ward

Reputation: 1

.doc to .docx conversion in python

i have been trying to use python and the win32com client to save multiple files from .doc to .docx (so i can then edit them with the python docx program) I run the below code and get a "finished with no errors and an exit code of 0" in pycharm. This is my first foray into Python.

After i run it i get 0 errors but the documents are still all .doc

from glob import glob
import re
import os
import win32com.client as win32
from win32com.client import constants

paths = glob('C:\test\*.doc', recursive=True)

def save_as_docx(path):
   
    word = win32.gencache.EnsureDispatch('Word.Application')
    doc = word.Documents.Open(path)
    doc.Activate ()
    
    
    new_file_abs = os.path.abspath(path)
    new_file_abs = re.sub(r'\.\w+$', '.docx', new_file_abs)
    
    
    word.ActiveDocument.SaveAs(
        new_file_abs, FileFormat=constants.wdFormatXMLDocument
    )
    doc.Close(False)

for path in paths:
    save_as_docx(path)

After this i get

"Process finished with exit code 0" And all of the files are still .doc in the C:\test folder.

Upvotes: 0

Views: 4256

Answers (1)

AKX
AKX

Reputation: 168814

You should debug things a little by e.g. printing paths – your glob pattern is wrong: 'C:\test\*.doc' is a string that has C:, then a tab character, then the rest. You'll need to use raw strings to avoid backslash interpretation:

paths = glob(r'C:\test\*.doc', recursive=True)

Then, recursive=True does nothing since you're not using a double star:

paths = glob(r'C:\test\**.doc', recursive=True)

Finally, replacing the extension with .docx is better done with tools suited to the job, so all in all

from glob import glob
import os
import win32com.client as win32
from win32com.client import constants

def save_as_docx(path):
   
    word = win32.gencache.EnsureDispatch('Word.Application')
    doc = word.Documents.Open(path)
    doc.Activate ()
    
    new_file_abs = os.path.splitext(os.path.abspath(path))[0] + ".docx"
    
    word.ActiveDocument.SaveAs(
        new_file_abs, FileFormat=constants.wdFormatXMLDocument
    )
    doc.Close(False)

paths = glob(r'C:\test\**.doc', recursive=True)

for path in paths:
    save_as_docx(path)

should be closer to what you need.

Upvotes: 7

Related Questions