dude
dude

Reputation: 1

Python filepaths have double backslashes

Ultimately, I want to loop through every pdf in specified directory ('C:\Users\dude\pdfs_for_parsing') and print the metadata for each pdf. The issue is that when I try to loop through the "directory" I'm receiving the error "FileNotFoundError: [Errno 2] No such file or directory:". I understand this error is occurring because I now have double slashes in my filepaths for some reason.

Example Code

import PyPDF2
import os

path_of_the_directory = r'C:\Users\dude\pdfs_for_parsing'

directory = []
ext = ('.pdf')

def isolate_pdfs():
    for files in os.listdir(path_of_the_directory):
        if files.endswith(ext):
            x = os.path.abspath(files)
            directory.append(x)

    for pdf in directory:
        reader = PyPDF2.PdfReader(pdf)
        information = reader.metadata
        print(information)
        
isolate_pdfs()

If I print the file paths one at a time, I see that the files have single '/' like I'm expecting:

for pdf in directory:
    print(pdf)

The '//' seems to get added when I try to open each of the PDFs 'PDFFile = open(pdf,'rb')'

Upvotes: 0

Views: 240

Answers (1)

Eric Darchis
Eric Darchis

Reputation: 26767

Your issue has nothing to do with //, it's here:

os.path.abspath(files)

Say you have C:\Users....\x.pdf, you list that directory, so the files will contain x.pdf. You then take the absolute path of x.pdf, which the abspath supposes to be in the current directory. You should replace it with:

x = os.path.join(path_of_the_directory, files)

Other notes:

  • PDFFile and PDF shouldn't be in uppercase. Prefer pdf_file and pdf_reader. The latter also avoids the confusion with the for pdf in...
  • Try to use a debugger rather than print statements. This is how I found your bug. It can be in your IDE or in command line with python -i You can step through your code, test a few variations, fiddle with the variables...
  • Why is ext = ('.pdf') with braces ? It doesn't do anything but leads to think that it might be a tuple (but isn't).
  • As an exercise the first for can be written as: directory = [os.path.join(path_of_the_directory, x) for x in os.listdir(path_of_the_directory) if x.endswith(ext)]

Upvotes: 2

Related Questions