Reputation: 635
I am printing from a .pptx but the single sentences are split into new lines in between from somewhere ..Here is the screenshot from a slide..
When reading through below code.. from pptx import Presentation
prs = Presentation(path_to_presentation)
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
print(run.text)
Getting output like below...
Books include:
Learning Python
by Mark Lutz
Python Essential Reference
by David Beazley
Python Cookbook
, ed. by Martelli, Ravenscroft and Ascher
(online at http://code.activestate.com/recipes/langs/python/)
http://wiki.python.org/moin/PythonBooks
You can compare the screenshot fro pptx and the printed text from pptx , bullet points are getting split into two or more sentences ..Like "Learning Python by Mark Lutz" printing in 2 points "Learning Python" and "by Mark Lutz" and even bullets are getting missed.
How to fix this issue?
Upvotes: 1
Views: 1716
Reputation: 29021
Short answer is use paragraph.text
not run.text
:
for paragraph in shape.text_frame.paragraphs:
print(paragraph.text)
A paragraph is a coherent block of text that flows between margins without a vertical break. This is a user distinction because it affects how we read the content. A run is a sequence of characters that shares the same character formatting (i.e. font, but including bold, italic, etc.). A run is a technical distinction because their boundaries should not be apparent to a reader; they are just used to tell PowerPoint "apply this character formatting to all these characters".
If you print every run separately, they'll break at seemingly random places in the paragraph, depending at least on where italics turn on and off, but also frequently at other places, like where someone edited to add a few characters. PowerPoint does not necessarily minimize the number of runs, even when two consecutive runs have the same formatting. Consequently, they tend to proliferate.
Upvotes: 4