Rakesh Agarwal
Rakesh Agarwal

Reputation: 21

finding red color words in docx file through python code

I have a mircosoft docx file which has few words in red color. Now I want to read that file through python code and extract those red words.

But I cannot find the apis that should be used for it.. I tried to iterate on para to access individual words .. but it says para is not iterable . I'am also not sure how to check color of the word.

Can you please help on it.

import docx

def readtxt(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        print(para.text);

readtxt('C:\\Users\\X\\some.docx')

Regards

Upvotes: 1

Views: 1786

Answers (1)

Shivam Roy
Shivam Roy

Reputation: 2061

Try this, the function will return a list of all contiguous parts of the document which are in red.

import docx
from docx.shared import RGBColor

def readtxt(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        for run in para.runs:
            if run.font.color.rgb == RGBColor(255, 000, 000):
                fullText.append(run.text)
    return fullText

fullText = readtxt('filepath.docx')

Also, please check that you're passing the filepath correctly.

Upvotes: 1

Related Questions