mayautobot
mayautobot

Reputation: 111

Convert ppt file to pptx in Python

Is there any way to convert .ppt files to .pptx files.

Objective: I need to extract text from table (with Column Names as Name, address, contact number, email, etc) from .ppt files. For this I followed this approach:

I converted .ppt file to pdf and then extracted the data from pdf using PDFminer. The text extracted from pdf is not separated by any delimiter. Due to this it is very difficult to distinguish names and other fields in the table.

Probable solution I am working on:

  1. Convert .ppt files to .pptx
  2. Parse xml of .pptx file to get the formatted text

I am stuck at first step of converting the file format from .ppt to .pptx. I couldn't find any solution for converting .ppt file format to .pptx formt in python.

Upvotes: 8

Views: 6553

Answers (5)

Steve Rindsberg
Steve Rindsberg

Reputation: 14809

Most/all of the other proposed answers assume that PowerPoint is installed, then automate it using Python; from the comments, it seems there are problems with some/all of them.

Since PowerPoint is assumed, and since it has VBA built in, why not use that?

I've posted some code here that will do something to every file in a given folder: https://www.rdpslides.com/pptfaq/FAQ00536_Batch-_Do_something_to_every_file_in_a_folder.htm

For each file found it calls a routine called MyMacro. Change it to call SaveAsPPTX instead and use this:

Sub SaveAsPPTX(sOldName As String)

Dim oPres As Presentation
Dim sNewName As String

' Assuming you've stored the filename in string var sFilename:
Set oPres = Presentations.Open(sFilename, msoTrue, , msoFalse)
' Note: this will open the presentation windowlessly
' Saves vast amounts of time

' Strip off .PPT extension
sNewName = Mid$(sOldName, 1, Len(sOldName) - InStr(sOldName, "."))

' Add .PPTX extension
sNewName = sNewName & ".PPTX"

' Save to new name and close the file
oPres.SaveAs sNewName, ppSaveAsOpenXMLPresentation
oPres.Close

End Sub

Upvotes: 0

piyush kumar
piyush kumar

Reputation: 41

I have created this code hope this works for you :

import win32com.client

PptApp = win32com.client.Dispatch("Powerpoint.Application")
PptApp.Visible = True
PPtPresentation = PptApp.Presentations.Open(r'D:\ppt\sample.ppt')
PPtPresentation.SaveAs(r'D:\ppt\final.pptx', 24)
PPtPresentation.close()
PptApp.Quit()

edit: This also works on python3.11.9 by pip install pywin32

Upvotes: 3

leekyounghwa
leekyounghwa

Reputation: 26

Work perfect on anaconda 3 + jupyter notebook

from glob import glob
import re
import os
import win32com.client

paths = glob('C:\\yourfilePath\\*.ppt', recursive=True)

def save_as_pptx(path):
    PptApp = win32com.client.Dispatch("Powerpoint.Application")
    PptApp.Visible = True
    PPtPresentation = PptApp.Presentations.Open(path)
    PPtPresentation.SaveAs(path+'x', 24)
    PPtPresentation.close()
    PptApp.Quit()
    
for path in paths:
    print(path.replace("\\yourfile\\", "\\yourfile_pptx\\"))
    save_as_pptx(path)

Upvotes: 0

import os
os.system("libreoffice --headless --invisible --convert-to pptx *.ppt")

Upvotes: 0

fsfr23
fsfr23

Reputation: 55

For MacOS Homebrew users: install Apache Tika (brew install tika)

The command-line interface works like this:

tika --text something.ppt > something.txt

And to use it inside python script:

import os
os.system("tika --text temp.ppt > temp.txt")

You will be able to do it and that is the only solution I have so far.

Upvotes: 0

Related Questions