Reputation: 11
I'm making a file compiler that takes data from csv files and after every 8 files read concatenates the data into a separate csv file. The program is taking quite long to be able to do these functions. is there a more optimized way to go about this?
I'm currently reading the csv file into a pandas data frame then appending said data frames into a list to compile them for pd.concat()
after.
edit:
The inputs used in the pd.read_csv
call is the root's directory and the files name that's being read since im using os.walk to jump from folder to folder. The content in each of the folders is an inconsistent amount of csv files storing data for a model's MSE RMSE and MAE. the reason why im using a data frame is because im trying to use the data in each of the csv files for further data analysis(reason why it concatenates every 8 files is because each model has 8 outputs). All csv files have one row for a header and are 6 columns by 5 rows.
code snippet:
data = []
data_value = pd.read_csv(os.path.join(root, file), sep='\t') #Reading data into df
data.append(data_value) # appending df to a list
pd.concat(data) #concatenating all data in list into a data frame
Upvotes: 1
Views: 58
Reputation: 3
As stated by others, this question is too generic and doesn't provide much info about the issue. However, the best thing you can do is to simply read all files separately and concat them without creating said list like that and appending constantly.
df1 = pd.read_csv(path_to_file1, ...)
df2 = pd.read_csv(path_to_file2, ...)
df3 = pd.read_csv(path_to_file3, ...)
df4 = pd.read_csv(path_to_file4, ...)
df5 = pd.read_csv(path_to_file5, ...)
df6 = pd.read_csv(path_to_file6, ...)
df7 = pd.read_csv(path_to_file7, ...)
df8 = pd.read_csv(path_to_file8, ...)
df_final = pd.concat(
[df1, df2, df3, df4, df5, df6, df7, df8],
**kwargs
)
Or you could just concatenate 2 files per execution and store the resulting file and do it recursively until only two files are to concat. Note that, when I mean recursively, I don't mean coding a recursive function, since it would be too memory costly. Create a script to concat 2 files and store the result and then use that result as one of the dfs to concat in the next execution of the script.
Upvotes: 0