Reputation: 294
I have read multiple answers but none have worked in my case so far. I want to read multiple csv files (which may not be in the same directory as my python file), without specifying names (as I may have to read thousands of such files). I want to do something like the last example in this but I am not sure how to add my desktop path.
I tried the following, as given in the link:
# Assign path. The folder "Healthy" contains all the csv files
path, dirs, files = next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy"))
file_count = len(files)
# create empty list
dataframes_list = []
# append datasets to the list
for i in range(file_count):
temp_df = pd.read_csv("./csv/"+files[i])
dataframes_list.append(temp_df)
However, I got the following error: "FileNotFoundError: [Errno 2] No such file or directory:". I am using MAC OS. Can someone please help? Thank you!
Upvotes: 1
Views: 939
Reputation: 77337
In your example, path
is the root of each file in files
, so you can do
temp_df = pd.read_csv(os.path.join(path, files[i]))
But we really wouldn't do it this way. Suppose there aren't any files in the directory, then next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy"))
would raise a StopIteration
error that you don't handle. I think it would be more natural to use os.listdir
, glob.glob
or even pathlib.Path
. Since pathlib
keeps track of the root for you, a good choice is
from pathlib import Path
import pandas as pd
healthy = Path("/Users/my_name/Desktop/All hypnograms/Healthy")
dataframes_list = [pd.read_csv(file) for file in healthy.iterdir()
if file.is_file()]
Many pandas errors inherit from ValueError
. If you have problems with some files, you can put the read into an exception handler to find out which files are in error
dataframes_list = []
error_files = []
for file in helthy.iterdir():
if file.is_file():
try:
dataframes_list.append(pd.read_csv(file, skiprows=18))
except ValueError as e:
error_files.append(file)
print(f"{file}: {e}")
Upvotes: 1
Reputation: 2263
Assuming you want indeed to filter the files list by excluding non .csv
files in order to use the pandas method read_csv
:
Proposed code to execute :
Like you do not provide dataframe to work with I voluntarily excluded pd.read_csv
but you would have to use pd.read_csv(os.path.join(path, f))
in real code.
import os
from pathlib import Path
# Let'us suppose path and files following values
path = '/home/Motors'
files = ['engine.html', 'engine.csv']
dataframes_list=[]
for f in files:
if Path(f).suffixes[0]=='.csv':
# temp_df = pd.read_csv(os.path.join(path, f))
temp_df = os.path.join(path, f)
dataframes_list.append(temp_df)
print(dataframes_list)
Result :
['/home/Motors/engine.csv']
To answer to S C comment:
What you should do is, as a first step, create a an iterator containing all the names. And after that to read it by chunks to make short listnames to process.
filenames = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']
def iterchunks(filenames, n):
for i in range(0, len(filenames), n):
yield filenames[i:i + n]
chk = iterchunks(filenames, n=2)
print(next(chk))
# ['A', 'B']
print(next(chk))
# ['C', 'D']
Upvotes: 0
Reputation: 629
I guess you should specify the whole path in read_csv
method by adding the path variable to the concatenated string. Something like :
for i in range(file_count):
temp_df = pd.read_csv(path + "/csv/" + files[i])
dataframes_list.append(temp_df)
You can remove the "/csv/"
by doing path + files[i]
directly if your CSV files are in the Healthy directory
Upvotes: 0
Reputation: 120391
You can use pathlib
to do that easily:
import pandas as pd
import pathlib
DATA_DIR = pathlib.Path.home() / 'Desktop' / 'All hypnograms' / 'Healthy' / 'csv'
dataframes_list = []
for csvfile in DATA_DIR.glob('**/*.csv'):
temp_df = pd.read_csv(csvfile)
dataframes_list.append(temp_df)
Upvotes: 0