Rahul
Rahul

Reputation: 11560

Strange behavior of Pywin32 while using word

I am doing this:

import win32com.client as win32
infile = r"D:\path\to\file.docx"
# def word_table(infile):
word = win32.gencache.EnsureDispatch('Word.Application')
doc = word.Documents.Open(infile)
word.Visible = False
rng = doc.Range()
for tbl in rng.Tables:
    for i in range(tbl.Rows.Count):
        page_name = tbl.Cell(i, 1).Range.Paragraphs(1).Range.Text
        hyper_link = tbl.Cell(i, 2).Range.Paragraphs(1).Range.Hyperlinks(1).Address
        print(page_name,  hyper_link)

This only prints hyper_link and not the page_name (Even if I change the order). but if I do:

print(page_name)
print(hyper_link)

This works just fine. I could not guess the reason for this unexpected behavior.

I posted it as an answer to this question: How to extract hyperlinks from MS Word table with Python?

Upvotes: 0

Views: 230

Answers (1)

Rahul
Rahul

Reputation: 11560

The behavior is due to the fact that Microsoft Word table have End of table cell character.

So page_name = tbl.Cell(i, 1).Range.Paragraphs(1).Range.Text will grab whatever text in the cell plus CR ('\r') and BEL ('•') . Therefore it doesn't print properly.

print(page_name.split('\r')[0] , hyper_link) works just fine in this circumstances.

Upvotes: 1

Related Questions