Reputation: 732
Question in Short: How can I use the find and replace option (Ctrl+H) using the Python-pptx module?
Example Code:
from pptx import Presentation
nameOfFile = "NewPowerPoint.pptx" #Replace this with: path name on your computer + name of the new file.
def open_PowerPoint_Presentation(oldFileName, newFileName):
prs = Presentation(oldFileName)
prs.save(newFileName)
open_PowerPoint_Presentation('Template.pptx', nameOfFile)
I have a Power Point document named "Template.pptx". With my Python program I am adding some slides and putting some pictures in them. Once all the pictures are put into the document it saves it as another power point presentation.
The problem is that this "Template.pptx" has all the old week numbers in it, Like "Week 20". I want to make Python find and replace all these word combinations to "Week 25" (for example).
Upvotes: 4
Views: 20577
Reputation: 11
I want to share what is working for me. I'm using Python 3.12.1
from pptx import Presentation
import pathlib
import datetime
template=pathlib.Path(r'C:\Template.pptx')
outputfile=pathlib.Path(r'C:\Output.pptx')
params={#your_keys_&_values
}
def search_and_replace(input, output,**kwargs):
""""search and replace text in PowerPoint while preserving formatting"""
prs = Presentation(input)
for slide in prs.slides:
for shape in slide.shapes:
for key, value in kwargs.items():
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
if key in run.text:
run.text=run.text.replace(key,value)
if shape.has_table:
for row in shape.table.rows:
for cell in row.cells:
paras = cell.text_frame.paragraphs
for para in paras:
for run in para.runs:
if key in run.text:
new_text = run.text.replace(str(key), str(value))
fn = run.font.name
fz = run.font.size
run.text = new_text
run.font.name = fn
run.font.size = fz
prs.save(output)
search_and_replace(template,outputfile,**params)
Upvotes: 1
Reputation: 994
Since PowerPoint splits the text of a paragraph into seemingly random runs (and on top each run carries its own - possibly different - character formatting) you can not just look for the text in every run, because the text may actually be distributed over a couple of runs and in each of those you'll only find part of the text you are looking for.
Doing it at the paragraph level is possible, but you'll lose all character formatting of that paragraph, which might screw up your presentation quite a bit.
Using the text on paragraph level, doing the replacement and assigning that result to the paragraph's first run while removing the other runs from the paragraph is better, but will change the character formatting of all runs to that of the first one, again screwing around in places, where it shouldn't.
Therefore I wrote a rather comprehensive script that can be installed with
python -m pip install python-pptx-text-replacer
and that creates a command python-pptx-text-replacer
that you can use to do those replacements from the command line, or you can use the class TextReplacer in that package in your own Python scripts. It is able to change text in tables, charts and wherever else some text might appear, while preserving any character formatting specified for that text.
Read the README.md at https://github.com/fschaeck/python-pptx-text-replacer for more detailed information on usage. And open an issue there if you got any problems with the code!
Also see my answer at python-pptx - How to replace keyword across multiple runs? for an example of how the script deals with character formatting...
Upvotes: 0
Reputation: 518
Merging responses above and other in a way that worked well for me (PYTHON 3). All the original format was keeped:
from pptx import Presentation
def replace_text(replacements, shapes):
"""Takes dict of {match: replacement, ... } and replaces all matches.
Currently not implemented for charts or graphics.
"""
for shape in shapes:
for match, replacement in replacements.items():
if shape.has_text_frame:
if (shape.text.find(match)) != -1:
text_frame = shape.text_frame
for paragraph in text_frame.paragraphs:
whole_text = "".join(run.text for run in paragraph.runs)
whole_text = whole_text.replace(str(match), str(replacement))
for idx, run in enumerate(paragraph.runs):
if idx != 0:
p = paragraph._p
p.remove(run._r)
if bool(paragraph.runs):
paragraph.runs[0].text = whole_text
if __name__ == '__main__':
prs = Presentation('input.pptx')
# To get shapes in your slides
slides = [slide for slide in prs.slides]
shapes = []
for slide in slides:
for shape in slide.shapes:
shapes.append(shape)
replaces = {
'{{var1}}': 'text 1',
'{{var2}}': 'text 2',
'{{var3}}': 'text 3'
}
replace_text(replaces, shapes)
prs.save('output.pptx')
Upvotes: 5
Reputation: 139
I encountered a similar issue that the formatted placeholder spreads over multiple run object. I would like to keep the format, so i could not do the replacement in the paragraph level. Finally, i figure out a way to replace the placeholder.
variable_pattern = re.compile("{{(\w+)}}")
def process_shape_with_text(shape, variable_pattern):
if not shape.has_text_frame:
return
whole_paragraph = shape.text
matches = variable_pattern.findall(whole_paragraph)
if len(matches) == 0:
return
is_found = False
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
matches = variable_pattern.findall(run.text)
if len(matches) == 0:
continue
replace_variable_with(run, data, matches)
is_found = True
if not is_found:
print("Not found the matched variables in the run segment but in the paragraph, target -> %s" % whole_paragraph)
matches = variable_pattern.finditer(whole_paragraph)
space_prefix = re.match("^\s+", whole_paragraph)
match_container = [x for x in matches];
need_modification = {}
for i in range(len(match_container)):
m = match_container[i]
path_recorder = space_prefix.group(0)
(start_0, end_0) = m.span(0)
(start_1, end_1) = m.span(1)
if (i + 1) > len(match_container) - 1 :
right = end_0 + 1
else:
right = match_container[i + 1].start(0)
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
segment = run.text
path_recorder += segment
if len(path_recorder) >= start_0 + 1 and len(path_recorder) <= right:
print("find it")
if len(path_recorder) <= start_1:
need_modification[run] = run.text.replace('{', '')
elif len(path_recorder) <= end_1:
need_modification[run] = data[m.group(1)]
elif len(path_recorder) <= right:
need_modification[run] = run.text.replace('}', '')
else:
None
if len(need_modification) > 0:
for key, value in need_modification.items():
key.text = value
Upvotes: 0
Reputation: 8127
Posting code from my own project because none of the other answers quite managed to hit the mark with strings that have complex text with multiple paragraphs without losing formating:
prs = Presentation('blah.pptx')
# To get shapes in your slides
slides = [slide for slide in prs.slides]
shapes = []
for slide in slides:
for shape in slide.shapes:
shapes.append(shape)
def replace_text(self, replacements: dict, shapes: List):
"""Takes dict of {match: replacement, ... } and replaces all matches.
Currently not implemented for charts or graphics.
"""
for shape in shapes:
for match, replacement in replacements.items():
if shape.has_text_frame:
if (shape.text.find(match)) != -1:
text_frame = shape.text_frame
for paragraph in text_frame.paragraphs:
for run in paragraph.runs:
cur_text = run.text
new_text = cur_text.replace(str(match), str(replacement))
run.text = new_text
if shape.has_table:
for row in shape.table.rows:
for cell in row.cells:
if match in cell.text:
new_text = cell.text.replace(match, replacement)
cell.text = new_text
replace_text({'string to replace': 'replacement text'}, shapes)
Upvotes: 12
Reputation: 135
For those of you who just want some code to copy and paste into your program that finds and replaces text in a PowerPoint while KEEPING formatting (just like I was,) here you go:
def search_and_replace(search_str, repl_str, input, output):
""""search and replace text in PowerPoint while preserving formatting"""
#Useful Links ;)
#https://stackoverflow.com/questions/37924808/python-pptx-power-point-find-and-replace-text-ctrl-h
#https://stackoverflow.com/questions/45247042/how-to-keep-original-text-formatting-of-text-with-python-powerpoint
from pptx import Presentation
prs = Presentation(input)
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
if(shape.text.find(search_str))!=-1:
text_frame = shape.text_frame
cur_text = text_frame.paragraphs[0].runs[0].text
new_text = cur_text.replace(str(search_str), str(repl_str))
text_frame.paragraphs[0].runs[0].text = new_text
prs.save(output)
The prior is a combination of many answers, but it gets the job done. It simply replaces search_str
with repl_str
in every occurrence of search_str
.
In the scope of this answer, you would use:
search_and_replace('Week 20', 'Week 25', "Template.pptx", "NewPowerPoint.pptx")
Upvotes: 10
Reputation: 127
Here's some code that could help. I found it here:
search_str = '{{{old text}}}'
repl_str = 'changed Text'
ppt = Presentation('Presentation1.pptx')
for slide in ppt.slides:
for shape in slide.shapes:
if shape.has_text_frame:
shape.text = shape.text.replace(search_str, repl_str)
ppt.save('Presentation1.pptx')
Upvotes: -1
Reputation: 9
I know this question is old, but I have just finished a project that uses python to update a powerpoint daily. Bascially every morning the python script is run and it pulls the data for that day from a database, places the data in the powerpoint, and then executes powerpoint viewer to play the powerpoint.
To asnwer your question, you would have to loop through all the Shapes on the page and check if the string you're searching for is in the shape.text. You can check to see if the shape has text by checking if shape.has_text_frame is true. This avoids errors.
Here is where things get trickey. If you were to just replace the string in shape.text with the text you want to insert, you will probably loose formatting. shape.text is actually a concatination of all the text in the shape. That text may be split into lots of 'runs', and all of those runs may have different formatting that will be lost if you write over shape.text or replace part of the string.
On the slide you have shapes, and shapes can have a text_frame, and text_frames have paragraphs (atleast one. always. even when its blank), and paragraphs can have runs. Any level can have formatting, and you have no way of determining how many runs your string is split over.
In my case I made sure that any string that was going to be replaced was in its own shape. You still have to drill all the way down to the run and set the text there so that all formatting would be preserved. Also, the string you match in shape.text may actually be spread across multiple runs, so when setting the text in the first run, I also set the text in all other runs in that paragraph to blank.
random code snippit:
from pptx import Presentation
testString = '{{thingToReplace}}'
replaceString = 'this will be inserted'
ppt = Presentation('somepptxfile.pptx')
def replaceText(shape, string,replaceString):
#this is the hard part
#you know the string is in there, but it may be across many runs
for slide in ppt.slides:
for shape in slide.shapes:
if shape.has_text_frame:
if(shape.text.find(testString)!=-1:
replaceText(shape,testString,replaceString)
Sorry if there are any typos. Im at work.....
Upvotes: 0
Reputation: 28893
You would have to visit each slide on each shape and look for a match using the available text features. It might not be pretty because PowerPoint has a habit of splitting runs up into what may seem like odd chunks. It does this to support features like spell checking and so forth, but its behavior there is unpredictable.
So finding the occurrences with things like Shape.text will probably be the easy part. Replacing them without losing any font formatting they have might be more difficult, depending on the particulars of your situation.
Upvotes: 2