Reputation: 61
For example: Please find below paragraphs in a word document. The paragraphs are inside a table.
I'm trying to replace "get" with "wake". I am looking for "get" to replace with "wake" only in the case of paragraph 1. But in the below-given code, its getting replaced in both paragraph as shown in below result. This behavior is same for all paragraphs in a word document. Please suggest working as per the above requirement.
Actual Result: 1. Ok Guys Please wake up. 2. Ok Guys Please waketing up.
doc = docx.Document("path/docss.docx")
def Search_replace_text():
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
for run in paragraph.runs:
if str(word.get()) in run.text:
text = run.text.split(str(word.get())) # Gets input from GUI
if text[1] == " ":
run.text = text[0] + str(replace.get()) # Gets input from GUI
print(run.text)
else:
run.text = text[0] + str(replace.get()) + text[1]
else: break
doc.save("docss.docx")
I want the result as shown below:
Ok Guys Please wake up.
Ok Guys Please getting up.
Actual Result:
Ok Guys Please wake up.
Ok Guys Please waketing up.
Upvotes: 4
Views: 9872
Reputation: 1050
The problem with replacing text in runs is that the text can become split over multiple runs meaning a simple find and replace of the text will not always work.
Adapting my answer to Python docx Replace string in paragraph while keeping style
The text to be replaced can be split over several runs so it needs to searched by partial matching, identify which runs need to have text replaced then replace the text in those identified.
This function replaces strings and retains the original text styling. This process is the same regardless of whether styling is required to be retained as it is the styling that causes text to be potentially broken into multiple runs, even if the text visually lacks styling.
import docx
def docx_find_replace_text(doc, search_text, replace_text):
paragraphs = list(doc.paragraphs)
for t in doc.tables:
for row in t.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
paragraphs.append(paragraph)
for p in paragraphs:
if search_text in p.text:
inline = p.runs
# Replace strings and retain the same style.
# The text to be replaced can be split over several runs so
# search through, identify which runs need to have text replaced
# then replace the text in those identified
started = False
search_index = 0
# found_runs is a list of (inline index, index of match, length of match)
found_runs = list()
found_all = False
replace_done = False
for i in range(len(inline)):
# case 1: found in single run so short circuit the replace
if search_text in inline[i].text and not started:
found_runs.append((i, inline[i].text.find(search_text), len(search_text)))
text = inline[i].text.replace(search_text, str(replace_text))
inline[i].text = text
replace_done = True
found_all = True
break
if search_text[search_index] not in inline[i].text and not started:
# keep looking ...
continue
# case 2: search for partial text, find first run
if search_text[search_index] in inline[i].text and inline[i].text[-1] in search_text and not started:
# check sequence
start_index = inline[i].text.find(search_text[search_index])
check_length = len(inline[i].text)
for text_index in range(start_index, check_length):
if inline[i].text[text_index] != search_text[search_index]:
# no match so must be false positive
break
if search_index == 0:
started = True
chars_found = check_length - start_index
search_index += chars_found
found_runs.append((i, start_index, chars_found))
if search_index != len(search_text):
continue
else:
# found all chars in search_text
found_all = True
break
# case 2: search for partial text, find subsequent run
if search_text[search_index] in inline[i].text and started and not found_all:
# check sequence
chars_found = 0
check_length = len(inline[i].text)
for text_index in range(0, check_length):
if inline[i].text[text_index] == search_text[search_index]:
search_index += 1
chars_found += 1
else:
break
# no match so must be end
found_runs.append((i, 0, chars_found))
if search_index == len(search_text):
found_all = True
break
if found_all and not replace_done:
for i, item in enumerate(found_runs):
index, start, length = [t for t in item]
if i == 0:
text = inline[index].text.replace(inline[index].text[start:start + length], str(replace_text))
inline[index].text = text
else:
text = inline[index].text.replace(inline[index].text[start:start + length], '')
inline[index].text = text
# print(p.text)
# sample usage as per example
doc = docx.Document('find_replace_test_document.docx')
docx_find_replace_text(doc, 'Testing1', 'Test ')
docx_find_replace_text(doc, 'Testing2', 'Test ')
docx_find_replace_text(doc, 'rest', 'TEST')
doc.save('find_replace_test_result.docx')
Here are a couple of screenshots showing a source document and the result after replacing the text:
'Testing1' -> 'Test '
'Testing2' -> 'Test '
'rest' -> 'TEST'
Source document:
Resultant document:
I hope this helps someone.
Upvotes: 6
Reputation: 51
replace
if str(word.get()) in run.text:
with little formating
if ' {} '.format(str(word.get())) in run.text:
to search separeted word(with two spaces).
Upvotes: 1