Andrew
Andrew

Reputation: 2157

How to get to know paragraph color docx python?

I have document where some lines are highlighted. I have managed to open it and get text line by line:

doc = Document("/Users/an/PycharmProjects/projects/test.docx")

for para in doc.paragraphs:
   print(para.text)

but I don't know how to check which color has each line. Maybe this library has some method for such task?

Upvotes: 0

Views: 475

Answers (1)

scanny
scanny

Reputation: 28863

A run can have a .highlight_color, which is perhaps what you're looking for:

for p in doc.paragraphs:
    for r in p.runs:
        print(r.font.highlight_color)

There is no concept of "line" in Word. There are paragraphs, within which there are runs. A run is a sequence of characters that shares the same character formatting (aka. "font").

Note that breaks between runs occur arbitrarily in text and there's nothing stopping a "line" of highlighted text being broken into several runs. So you may need to check for adjacent runs with the same .highlight_color value to "assemble" those into the same sequence of text in the "apparent" highlighted passage.

Upvotes: 1

Related Questions