Aliya
Aliya

Reputation: 31

Convert all pdf in a folder to text files and store them in different folders using python

Im trying to convert all the pdf stored in one file, say 60 pdfs into text documents and store them in different folders. the folder should have unique names. i tried this code.The folders where created, but the pdftotext conversion command doesnt work in the loop:

import os
def listfiles(path):
    for root, dirs, files in os.walk(path):
        for f in files:
                print(f)
        newpath = r'/home/user/files/'
        p=f.replace("pdf","")
        newpath=newpath+p 
        if not os.path.exists(newpath): os.makedirs(newpath)
        os.system("pdftotext f f.txt")

f=listfiles("/home/user/reports")

Upvotes: 2

Views: 1372

Answers (2)

Matt
Matt

Reputation: 744

One problem here is the os.system("pdftotext f f.txt") call. I assume you want the f's here replaced with the current file in the loop. If that is the case you need to change this to os.system("pdftotext {0} {0}.txt".format(f))

Another issue may be that the working directory is not being set up so the call to system is looking for the file in the wrong place. Try using os.chdir every time you change folders.

to place the text file in a diffrent folder try:

os.system("pdftotext {0} {1}/{0}.txt".format(f, newpath))

Upvotes: 2

KJaeg
KJaeg

Reputation: 686

I don't know Python, but I think I can clearly see a mistake there. It looks like you are just replacing the ".pdf" with a ".txt". Since a PDF isn't just plain text, this won't work. For the convertion look at the top answer of this post: Python module for converting PDF to text

Upvotes: 0

Related Questions