Reputation: 2157
I have document where some lines are highlighted. I have managed to open it and get text line by line:
doc = Document("/Users/an/PycharmProjects/projects/test.docx")
for para in doc.paragraphs:
print(para.text)
but I don't know how to check which color has each line. Maybe this library has some method for such task?
Upvotes: 0
Views: 475
Reputation: 28863
A run can have a .highlight_color
, which is perhaps what you're looking for:
for p in doc.paragraphs:
for r in p.runs:
print(r.font.highlight_color)
There is no concept of "line" in Word. There are paragraphs, within which there are runs. A run is a sequence of characters that shares the same character formatting (aka. "font").
Note that breaks between runs occur arbitrarily in text and there's nothing stopping a "line" of highlighted text being broken into several runs. So you may need to check for adjacent runs with the same .highlight_color
value to "assemble" those into the same sequence of text in the "apparent" highlighted passage.
Upvotes: 1