Also
Also

Reputation: 119

Extract positions of bold words with Python

I would like to extract the position of bold words detected in a .docx file.

For that, I have used docx library, and it successfully detects the words with bold format. However, is not very useful to extract only the word, since you may find the same word, but in another format.

For example:

Let's assume that my file.docx contains : "My cat is not a normal cat"

from docx import *

document = Document('/path/to/file.docx')
            def bold(document):
                for para in document.paragraphs:
                    Listbolds = []
                    for run in para.runs:
                        if run.bold:
                            print run.text
                            word = run.text
                            Listbolds.append(word)
                return Listbolds

This function would give me the word "cat" as output. However, if I try to filter my text by those words which are not bold, and I use this, I would eliminate also the second "cat", which is not bold.

Any idea about how to get only the position of this word? For exaple, to obtain 2 as the word position.

Thank you all!

Upvotes: 1

Views: 5468

Answers (1)

09milk
09milk

Reputation: 66

I don't get the docx library, but just by looking at the code, maybe change it to return a boolean list?

document = Document('/path/to/file.docx')

def get_bold_list(para):
    bold_list = []
    for run in para.runs:
        bold_list.append(run.bold)
    return bold_list

for para in document.paragraphs:
    bold_list = get_bold_list(para)
    #do something with bold_list

Upvotes: 5

Related Questions