S C
S C

Reputation: 294

Reading multiple csv files into separate dataframes in Python

I have read multiple answers but none have worked in my case so far. I want to read multiple csv files (which may not be in the same directory as my python file), without specifying names (as I may have to read thousands of such files). I want to do something like the last example in this but I am not sure how to add my desktop path.

I tried the following, as given in the link:

# Assign path. The folder "Healthy" contains all the csv files
path, dirs, files = next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy"))
file_count = len(files)
# create empty list
dataframes_list = []
 
# append datasets to the list
for i in range(file_count):
    temp_df = pd.read_csv("./csv/"+files[i])
    dataframes_list.append(temp_df)

However, I got the following error: "FileNotFoundError: [Errno 2] No such file or directory:". I am using MAC OS. Can someone please help? Thank you!

Upvotes: 1

Views: 939

Answers (4)

tdelaney
tdelaney

Reputation: 77337

In your example, path is the root of each file in files, so you can do

temp_df = pd.read_csv(os.path.join(path, files[i]))

But we really wouldn't do it this way. Suppose there aren't any files in the directory, then next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy")) would raise a StopIteration error that you don't handle. I think it would be more natural to use os.listdir, glob.glob or even pathlib.Path. Since pathlib keeps track of the root for you, a good choice is

from pathlib import Path 
import pandas as pd

healthy = Path("/Users/my_name/Desktop/All hypnograms/Healthy")
dataframes_list = [pd.read_csv(file) for file in healthy.iterdir()
    if file.is_file()]

Many pandas errors inherit from ValueError. If you have problems with some files, you can put the read into an exception handler to find out which files are in error

dataframes_list = []
error_files = []

for file in helthy.iterdir():
    if file.is_file():
        try:
            dataframes_list.append(pd.read_csv(file, skiprows=18))
        except ValueError as e:
            error_files.append(file)
            print(f"{file}: {e}")

Upvotes: 1

Laurent B.
Laurent B.

Reputation: 2263

Assuming you want indeed to filter the files list by excluding non .csv files in order to use the pandas method read_csv :

Proposed code to execute :

Like you do not provide dataframe to work with I voluntarily excluded pd.read_csv but you would have to use pd.read_csv(os.path.join(path, f)) in real code.

import os
from pathlib import Path

# Let'us suppose path and files following values
path = '/home/Motors'
files = ['engine.html', 'engine.csv']

dataframes_list=[]

for f in files:
    if Path(f).suffixes[0]=='.csv':
        # temp_df = pd.read_csv(os.path.join(path, f))
        temp_df = os.path.join(path, f)
        dataframes_list.append(temp_df)
print(dataframes_list)

Result :

['/home/Motors/engine.csv']

To answer to S C comment:

What you should do is, as a first step, create a an iterator containing all the names. And after that to read it by chunks to make short listnames to process.

filenames = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']

def iterchunks(filenames, n):
    for i in range(0, len(filenames), n):
        yield filenames[i:i + n]

chk = iterchunks(filenames, n=2)

print(next(chk))       
# ['A', 'B']

print(next(chk))       
# ['C', 'D']

Upvotes: 0

SWEEPY
SWEEPY

Reputation: 629

I guess you should specify the whole path in read_csv method by adding the path variable to the concatenated string. Something like :

for i in range(file_count):
    temp_df = pd.read_csv(path + "/csv/" + files[i])
    dataframes_list.append(temp_df)

You can remove the "/csv/" by doing path + files[i] directly if your CSV files are in the Healthy directory

Upvotes: 0

Corralien
Corralien

Reputation: 120391

You can use pathlib to do that easily:

import pandas as pd
import pathlib

DATA_DIR = pathlib.Path.home() / 'Desktop' / 'All hypnograms' / 'Healthy' / 'csv'

dataframes_list = []
for csvfile in DATA_DIR.glob('**/*.csv'):
    temp_df = pd.read_csv(csvfile)
    dataframes_list.append(temp_df)

Upvotes: 0

Related Questions