RonyA
RonyA

Reputation: 635

Python-pptx - Sentence are getting split while print in multiple lines

I am printing from a .pptx but the single sentences are split into new lines in between from somewhere ..Here is the screenshot from a slide.. enter image description here

When reading through below code.. from pptx import Presentation

prs = Presentation(path_to_presentation)
for slide in prs.slides:
      for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
            for paragraph in shape.text_frame.paragraphs:
                for run in paragraph.runs:
                    print(run.text)

Getting output like below...

Books include:
Learning Python 
by Mark Lutz
Python Essential Reference 
by David Beazley
Python Cookbook
, ed. by Martelli, Ravenscroft and Ascher
(online at http://code.activestate.com/recipes/langs/python/)
http://wiki.python.org/moin/PythonBooks

You can compare the screenshot fro pptx and the printed text from pptx , bullet points are getting split into two or more sentences ..Like "Learning Python by Mark Lutz" printing in 2 points "Learning Python" and "by Mark Lutz" and even bullets are getting missed.

How to fix this issue?

Upvotes: 1

Views: 1716

Answers (1)

scanny
scanny

Reputation: 29021

Short answer is use paragraph.text not run.text:

for paragraph in shape.text_frame.paragraphs:
    print(paragraph.text)

A paragraph is a coherent block of text that flows between margins without a vertical break. This is a user distinction because it affects how we read the content. A run is a sequence of characters that shares the same character formatting (i.e. font, but including bold, italic, etc.). A run is a technical distinction because their boundaries should not be apparent to a reader; they are just used to tell PowerPoint "apply this character formatting to all these characters".

If you print every run separately, they'll break at seemingly random places in the paragraph, depending at least on where italics turn on and off, but also frequently at other places, like where someone edited to add a few characters. PowerPoint does not necessarily minimize the number of runs, even when two consecutive runs have the same formatting. Consequently, they tend to proliferate.

Upvotes: 4

Related Questions