Reputation: 43
I have a folder, i have 30 CSV files in there , all different name.
I want to loop through all the files and read them using pandas separately, and store them in a list of lists. While reading them separately, I want also to remove some variables from them at the same like remove columns which are correlated.
Currently, I am trying to do this.
import glob
import pandas as pd
path = os.getcwd()
# Get folder path containing text files
file_list = glob.glob(path + '/*.csv')
data = []
for file_path in file_list:
data.append(
pd.read_csv(file_path).drop(['column1', 'column2'], axis =1))
# now you can access it outside the "for loop..."
for d in data:
print(d)
So, I want to store each data frame as 2 D lists in a list and train my model because each[ [CSV][2] ][3]file is an observation. My CSV file is having(5000,12) observations. have label for each CSV or instance, which is the filename.
Don't know If I am moving in right direction.
len(data)
# 30
label = [1,1,1,1,1,1,1,1,1,1,,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3]
max_length = 25 # shape of data frame after removing two variables
# define the model
model = Sequential()
model.add(Dense(24, input_dim=25, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())
Upvotes: 1
Views: 1992
Reputation: 863226
I think you can create dictionary of DataFrames
with keys
from filenames:
file_list = glob.glob(path + '/*.csv')
dfs = {os.path.basename(fp).split('.')[0]:
pd.read_csv(fp).drop(['column1', 'column2'], axis=1) for fp in file_list}
For select use:
print (dfs['turbo1'])
EDIT:
dfs=np.array([pd.read_csv(fp).drop(['column1','column2'],axis=1).values for fp in file_list])
Upvotes: 3