Reputation: 117
I have a folder of csv files that I'd like to loop over to create individual DataFrames named after the file itself.
So if I have file_1.csv
, file_2.csv
, file_3.csv
... I'd like DataFrames created for each file and have the df named after the file of the data it contains.
Here is what I've tried so far:
# get list of all files
all_files = os.listdir("./Data/")
# get list of only csv files
csv_files = list(filter(lambda f: f.endswith('.csv'), all_files))
# remove file extension to get name only
file_names = []
for i in csv_files:
file = i[:-4]
file_names.append(file)
# create DataFrames from each file named after the corresonding file
dfs = []
def make_files_dfs():
for a,b in zip(file_names, csv_files):
if a == b[:-4]:
a = pd.read_csv(eval(f"'Data/{b}'"))
dfs.append(a)
error code:
ParserError: Error tokenizing data. C error: Expected 70 fields in line 7728, saw 74
chdir
instead of listdir
lambda
with glob
new attempt 1:
path = "./Data/"
os.chdir(path)
csv_files = glob.glob("*.csv")
dataFrameDict = {}
def make_files_dfs():
for a in csv_files:
dataFrameDict[a[:-4] , pd.read_csv(a)]
Error Code:
TypeError: unhashable type: 'DataFrame'
I feel like this needs a line to append the dicts to a list; will mess with it.
new attempt 2:
path = "./Data/"
os.chdir(path)
csv_files = glob.glob("*.csv")
for i in range(len(csv_files)):
globals()[f"df_{i}"] = pd.read_csv(csv_files[i])
Error code:
ParserError: Error tokenizing data. C error: Expected 70 fields in line 7728, saw 74
path = "./Data/"
os.chdir(path)
csv_files = glob.glob("*.csv")
csv_names = []
for i in csv_files:
name = i[:-4]
csv_names.append(name)
zip_object = zip(csv_names, csv_files)
df_collection = {}
for name, file in zip_object:
df_collection[name] = pd.read_csv(file, low_memory=False)
Upvotes: 0
Views: 1414
Reputation: 115
I don't understand why your code is so lengthy, but this can be done by following:
csv_list = ['file_1.csv', 'file_2.csv', 'file_3.csv']
for i in range(len(csv_list)):
globals()[f"df_{i}"] = pd.read_csv(csv_list[i])
Output:
Three dataframes will be created. df_1 will have 1st file in the list, df_2 will have 2nd file in the list and so on..
Upvotes: 1
Reputation: 56
Your code is a bit difficult to understand. You have some unnecessary functions. First of all, it is easier to change the working directory path (by os.chdir(path)
. Secondly, you can get rid of your lambda function and use glob.glob
. Lastly, you cannot make a DataFrame named after a variable. Your dfs
list will hold some class names that won't give you much insight into the DataFrame. It is much better to use a dictionary. Overall, this is how your code can look like:
import os
import glob
path = "the path to your data"
os.chdir(path)
# get list of only csv files
csv_files = glob.glob("/*.csv")
# create a dictionary with key as the DF name and values as DataFrames
dataFrameDictionary={}
def make_files_dfs():
for a in csv_files:
dataFrameDictionary[a[:-4], pd.read_csv(a)]
Upvotes: 1