Reputation: 21
I have a mircosoft docx file which has few words in red color. Now I want to read that file through python code and extract those red words.
But I cannot find the apis that should be used for it.. I tried to iterate on para to access individual words .. but it says para is not iterable . I'am also not sure how to check color of the word.
Can you please help on it.
import docx
def readtxt(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
print(para.text);
readtxt('C:\\Users\\X\\some.docx')
Regards
Upvotes: 1
Views: 1786
Reputation: 2061
Try this, the function will return a list of all contiguous parts of the document which are in red
.
import docx
from docx.shared import RGBColor
def readtxt(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
for run in para.runs:
if run.font.color.rgb == RGBColor(255, 000, 000):
fullText.append(run.text)
return fullText
fullText = readtxt('filepath.docx')
Also, please check that you're passing the filepath
correctly.
Upvotes: 1