vikash vishnu
vikash vishnu

Reputation: 397

how to regain the original font properties and its associated properties like bold, italics using python-docx while text replacement

I am using python-docx for a automation tool. I have a issue like once after I run the code for replacement of certain words in one list with corresponding in another list it is removing all the properties (like font size, font name, part of a text in bold or italics, bookmarks in the paragraphs or table) of the text in the paragraph and table and its coming with a plain text in "Calibri" with a font size of '12'.

The code that I used is:

wrongWord = "xyz"
correctWord = "abcd"
def iter_block_items(parent):
    if isinstance(parent, _Document):
        parent_elm = parent.element.body
    elif isinstance(parent, _Cell):
        parent_elm = parent._tc
    else:
        raise ValueError("something's not right")

    for child in parent_elm.iterchildren():
        if isinstance(child, CT_P):
            yield Paragraph(child, parent)
        elif isinstance(child, CT_Tbl):
            yield Table(child, parent)



document = Document(r"F:\python\documentSample.docx")
for block in iter_block_items(document):
    if isinstance(block, Paragraph):
        if wrongWord in block.text:
            block.text = block.text.replace(wrongWord, correctWord)
    else:
        for row in block.rows:
            for cell in row.cells:
                if wrongWord in cell.text:
                    cell.text = cell.text.replace(wrongWord, correctWord)

document.save(r"F:\python\documentSampleAfterChanges.docx")

Could you help me to get the same font size, font name and other associated properties to be copied from the original file after the text replacement.

Upvotes: 1

Views: 404

Answers (1)

scanny
scanny

Reputation: 28893

Search and replace is a hard problem in the general case, which is the main reason that feature hasn't been added yet.

What's happening here is that assigning to the .text attribute on the cell is removing all the existing runs and the font-related attributes are removed with those runs.

Font information (e.g. bold, italic, typeface, size) is stored at the run level (a paragraph is composed of zero or more runs). Assigning to the .text attribute removes all the runs and replaces them with a single new run containing the assigned text.

So the challenge is to find the text within the multiple runs somewhere, and preserve as much of the font formatting settings as possible.

This is a hard problem because Word breaks paragraph text into separate runs for many reasons, and runs tend to proliferate. There's no guarantee at all that your search term will be completely enclosed in a single run or start at a run boundary. So perhaps you start to see the challenge of a general-case solution.

One thing you can do that might work in your case is something like this:

# ---replace text of first run with new cell value---
runs = table_cell.paragraphs[0].runs
runs[0].text = replacement_text
# ---delete all remaining runs---
for run in runs[1:]:
    r = run._element
    r.getparent().remove(r)

Basically this replaces the text of the first run and deletes any remaining runs. Since the first run often contains the formatting you want, this can often work. If the first word is formatted differently though, say bold, then all the replacement text will be bold too. You'll have to see how this approach works in your specific case.

Upvotes: 2

Related Questions