Create Individual DataFrames From a List of csv Files

Question

I have a folder of csv files that I'd like to loop over to create individual DataFrames named after the file itself. So if I have file_1.csv, file_2.csv, file_3.csv ... I'd like DataFrames created for each file and have the df named after the file of the data it contains.

Here is what I've tried so far:

# get list of all files
all_files = os.listdir("./Data/")

# get list of only csv files
csv_files = list(filter(lambda f: f.endswith('.csv'), all_files))

# remove file extension to get name only
file_names = []
for i in csv_files:
    file = i[:-4]
    file_names.append(file)
    
# create DataFrames from each file named after the corresonding file
dfs = []
def make_files_dfs():
    for a,b in zip(file_names, csv_files):
        if a == b[:-4]:
            a = pd.read_csv(eval(f"'Data/{b}'"))
            dfs.append(a)

error code:

ParserError: Error tokenizing data. C error: Expected 70 fields in line 7728, saw 74

Update 1

using chdir instead of listdir
replacing the lambda with glob
two new attempts using response suggestions

new attempt 1:

path = "./Data/"
os.chdir(path)

csv_files = glob.glob("*.csv")

dataFrameDict = {}
def make_files_dfs():
    for a in csv_files:
        dataFrameDict[a[:-4] , pd.read_csv(a)]

Error Code:

TypeError: unhashable type: 'DataFrame'

I feel like this needs a line to append the dicts to a list; will mess with it.

new attempt 2:

path = "./Data/"
os.chdir(path)

csv_files = glob.glob("*.csv")

for i in range(len(csv_files)):
    globals()[f"df_{i}"] = pd.read_csv(csv_files[i])

Error code:

ParserError: Error tokenizing data. C error: Expected 70 fields in line 7728, saw 74

Update 2

instead of trying to create a list of DataFrames, attempting to create a dictionary of DataFrames. Turns out the error code was from one file having extra columns of data in one record, as @Jon Clements pointed out.

path = "./Data/"
os.chdir(path)

csv_files = glob.glob("*.csv")

csv_names = []
for i in csv_files:
    name = i[:-4]
    csv_names.append(name)
    
zip_object = zip(csv_names, csv_files)

df_collection = {}
for name, file in zip_object:
    df_collection[name] = pd.read_csv(file, low_memory=False)

Raman · Accepted Answer

Your code is a bit difficult to understand. You have some unnecessary functions. First of all, it is easier to change the working directory path (by os.chdir(path). Secondly, you can get rid of your lambda function and use glob.glob. Lastly, you cannot make a DataFrame named after a variable. Your dfs list will hold some class names that won't give you much insight into the DataFrame. It is much better to use a dictionary. Overall, this is how your code can look like:

import os
import glob

path = "the path to your data"
os.chdir(path)

# get list of only csv files
csv_files = glob.glob("/*.csv")

# create a dictionary with key as the DF name and values as DataFrames 
dataFrameDictionary={}
def make_files_dfs():
    for a in csv_files:
        dataFrameDictionary[a[:-4], pd.read_csv(a)]

Create Individual DataFrames From a List of csv Files

Update 1

Update 2

Answers (2)

Related Questions