Find a word in multiple powerpoint files Python

Question

I have a lot of pptx files to search in a directory and I am looking for specific word "data" in these files. I created the below code which reads all the files but it does not provide the correct result of true or false. For example in Person1.pptx the word "data" exists in two "shapes". The question is where is exactly the mistake and why the code have incorrect results.

from pptx import Presentation
import os
files = [x for x in os.listdir("C:/Users/../Desktop/Test") if x.endswith(".pptx")]
for eachfile in files:
    prs = Presentation("C:/Users/.../Desktop/Test/" + eachfile)
    print(eachfile)
    print("----------------------")
    for slide in prs.slides:
        for shape in slide.shapes:
            print ("Exist? " + str(hasattr(shape, 'data')))

The result is as below

Person1.pptx
----------------------
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Person2.pptx
----------------------
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False

And the expected result would be to find in one of the slides the word "data" and print true. Actually the expected result would be:

Person1.pptx
----------------------
Exist? True

Person1.pptx
----------------------
Exist? False

True if in any of the shapes in each slide the word exists and false if in all shapes of the slide the word does not exist.

Irina · Accepted Answer

I found it by myself. :)

from pptx import Presentation
import os

files = [x for x in os.listdir("C:/Users/.../Desktop/Test") if x.endswith(".pptx")] 

for eachfile in files:
    prs = Presentation("C:/Users/.../Desktop/Test/" + eachfile) 
    for slide in prs.slides:
        for shape in slide.shapes:
            if hasattr(shape, "text"):
                shape.text = shape.text.lower()
                if "whatever_you_are_looking_for" in shape.text:
                    print(eachfile)
                    print("----------------------")
                    break

Find a word in multiple powerpoint files Python

Answers (2)

Related Questions