Amar Kumar
Amar Kumar

Reputation: 43

Read multiple file csv in pandas from a directory and store them in a array of list, each file as one observation

I have a folder, i have 30 CSV files in there , all different name.

I want to loop through all the files and read them using pandas separately, and store them in a list of lists. While reading them separately, I want also to remove some variables from them at the same like remove columns which are correlated.

Currently, I am trying to do this.

import glob
import pandas as pd
path = os.getcwd()


# Get folder path containing text files
file_list = glob.glob(path + '/*.csv')
data = []
for file_path in file_list:
    data.append(
        pd.read_csv(file_path).drop(['column1', 'column2'], axis =1))
# now you can access it outside the "for loop..."
for d in data:
    print(d) 

So, I want to store each data frame as 2 D lists in a list and train my model because each[ [CSV][2] ][3]file is an observation. My CSV file is having(5000,12) observations. have label for each CSV or instance, which is the filename.

Don't know If I am moving in right direction.

len(data) 
# 30

label = [1,1,1,1,1,1,1,1,1,1,,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3]
max_length = 25 # shape of data frame after removing two variables
# define the model

model = Sequential()
model.add(Dense(24, input_dim=25, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())

Upvotes: 1

Views: 1992

Answers (1)

jezrael
jezrael

Reputation: 863226

I think you can create dictionary of DataFrames with keys from filenames:

file_list = glob.glob(path + '/*.csv')
dfs = {os.path.basename(fp).split('.')[0]: 
       pd.read_csv(fp).drop(['column1', 'column2'], axis=1) for fp in file_list}

For select use:

print (dfs['turbo1'])

EDIT:

dfs=np.array([pd.read_csv(fp).drop(['column1','column2'],axis=1).values for fp in file_list])

Upvotes: 3

Related Questions