indigochild
indigochild

Reputation: 372

Checking Hyperlinks in Microsoft Word using Python and win32com

I am working on a program that will open up a Word document and check all the links in that document. It should report if any of the links are broken.

And I can do all that, using the win32com library for Python.

However, currently I am using HyperLink.follow() to check each link. The problem is that it actually opens each document and my screen becomes quickly filled with the open documents (my test file has about 15 links to different documents, in production I expect it could get up to hundreds).

How can I stop this from happening? I have a few ideas, but no idea how to go about any of them:

Current program:

#settings
debug = True

# Open a specified word document
wordapp = win32com.client.Dispatch('Word.Application')
wordapp.Visible = debug

directory = os.path.dirname(__file__)
filename = '0 - Cover.docx'
document_location = os.path.join(directory, filename)

if debug == True:
    print(document_location)

document = wordapp.Documents.Open(document_location)

if debug == True:
    print("Document opened succesfully.")

# Gimme the links
wordapp.ActiveDocument

for link in (wordapp.ActiveDocument.HyperLinks):
    print(link.Name)

    try:
        link.Follow()
    except:
        print("This link is broken.")
    else:
        print("This link did not raise an error.")

Upvotes: 2

Views: 1940

Answers (1)

Zev Spitz
Zev Spitz

Reputation: 15357

A Hyperlink has two properties -- Address, which (for local files) contains some location on the filesystem; and SubAddress which (for local files) refers to a location within the referred item -- the name of a Word bookmark, or an Excel named range of cells etc.

It might be sufficient to check if Address maps to a file on the filesystem, without ever opening the document at all. OTOH this wouldn't tell you if the link is entirely functional, as SubAddress might refer to a non-existent name.

If you want to check the full functionality of the hyperlinks, and all of them are expected to refer to Word documents, they will probably be opened in the context of the current Application. If that is the case, then you can programmatically access the newly opened document with the name, and close it:

import os

opened_doc = wordapp.Documents(os.path.basename(link.Address))
opened_doc.Close()

Caveats:

  • The above will only work for documents which are loaded into the current Application. This excludes other file types (Excel spreadsheet, Powerpoint presentation), or Word documents opened in another Application instance.
  • It's not quite accurate to say that client.Dispatch supports loading documents invisibly; it is the Word object model which by default loads invisibly. In any case, that is irrelevant to Hyperlink.Follow, which (if I understand correctly) depends on system APIs to open the relevant document with the appropriate application.

Upvotes: 1

Related Questions