Reputation: 4972

Python looping over folders and its subfolders to read CSV is getting file names but on read_csv it is returning file not found

I am trying to loop over folders and subfolder to access and read CSV files before transforming them into JSON. Here is the code I am working on:

cursor = conn.cursor()
try:
    # Specify the folder containing needed files
    folderPath = 'C:\\Users\\myUser\\Desktop\\toUpload' # Or using input()
    fwdPath = 'C:/Users/myUser/Desktop/toUpload'
    for countries in os.listdir(folderPath):
        for sectors in os.listdir(folderPath+'\\'+countries):
            for file in os.listdir(folderPath+'\\'+countries+'\\'+sectors):
                data = pd.DataFrame()
                filename, _ext = os.path.splitext(os.path.basename(folderPath+'\\'+countries+'\\'+file))
                print(file + ' ' + filename+ ' ' + sectors + ' ' + countries)
                data = pd.read_csv(file)
    # cursor.execute('SELECT * FROM SECTORS')
    # print(list(cursor))
finally:
    cursor.close()
conn.close()

The following print line is returning the file with its filename without the extension, and then sectors and countries folder names:

print(file + ' ' + filename+ ' ' + sectors + ' ' + countries)

myfile.csv myfile WASHSector CTRYIrq

Now when it comes to reading the CSV, it will take lots and lots of time and at the end O get the following error:

[Errno 2] File myfile.csv does not exist

Upvotes: 0

Answers (2)

Guillem

Reputation: 2647

Before reading the csv file, you should compose the whole path to the file, otherwise, pandas won't be able to read that file.

import os

# ...
path = os.path.join(folderPath, countries, sectors, file)
data = pd.read_csv(path)

Also instead of using three nested for loops I recommend you using the os.walk method. It will automatically recurse through directories

>>> folderPath = 'C:\\Users\\myUser\\Desktop\\toUpload'
>>> for root, _, files in os.walk(folderPath):
>>> ...   for f in files:
>>> ...     pd.read_csv(os.path.join(root, f))

Upvotes: 1

DorElias

Reputation: 2313

you need to give pd.read_csv the full path of the file, so change it to:

data = pd.read_csv(folderPath+'\\'+countries+'\\'+sectors + '\\' +file)

Upvotes: 1

Python looping over folders and its subfolders to read CSV is getting file names but on read_csv it is returning file not found

Answers (2)

Related Questions