Reputation: 11560
I am doing this:
import win32com.client as win32
infile = r"D:\path\to\file.docx"
# def word_table(infile):
word = win32.gencache.EnsureDispatch('Word.Application')
doc = word.Documents.Open(infile)
word.Visible = False
rng = doc.Range()
for tbl in rng.Tables:
for i in range(tbl.Rows.Count):
page_name = tbl.Cell(i, 1).Range.Paragraphs(1).Range.Text
hyper_link = tbl.Cell(i, 2).Range.Paragraphs(1).Range.Hyperlinks(1).Address
print(page_name, hyper_link)
This only prints hyper_link
and not the page_name
(Even if I change the order).
but if I do:
print(page_name)
print(hyper_link)
This works just fine. I could not guess the reason for this unexpected behavior.
I posted it as an answer to this question: How to extract hyperlinks from MS Word table with Python?
Upvotes: 0
Views: 230
Reputation: 11560
The behavior is due to the fact that Microsoft Word table have End of table cell character.
So page_name = tbl.Cell(i, 1).Range.Paragraphs(1).Range.Text
will grab whatever text in the cell plus CR
('\r'
) and BEL
('•') . Therefore it doesn't print properly.
print(page_name.split('\r')[0] , hyper_link)
works just fine in this circumstances.
Upvotes: 1