Reputation: 145
I am working with the next script to analize some data from an experiment. And at the end I would like to save certain data as a DataFrame in .csv format to keep working with it. But i am struggled where to place this dataframe at the end of the code.
import os
import numpy as np
import pandas as pd
import pylab as plt
import scipy
from scipy.optimize import curve_fit
DATE = '2020-02-18'
SAMPLE_NAME = 'TEST'
os.chdir('XXX/' + DATE+ '/' + SAMPLE_NAME)
MainFolder = 'XXX/' + DATE + '/' + SAMPLE_NAME
print('\n' + 'You are working on this directory: \n', os.getcwd(), '\n')
def LoadData(files, subfolders):
''' This function loads the data from the files '''
print('The sweeps in the folder are:')
df = []
for file_name in files:
if file_name.endswith('.csv'):
print(' ' + os.path.sep + file_name)
df.append(pd.read_csv(subfolders + os.path.sep + file_name, delimiter=','))
return df
def fit_sin_LD(t_LD, y_LD):
def fit_sin_APD(t_APD, y_APD):
def Plots():
for root, dirs, files in os.walk(MainFolder, topdown=True):
for subfolders in dirs:
print(os.path.sep + subfolders)
for subpath, subdirs, sweepfiles in os.walk(MainFolder + os.path.sep + subfolders + os.path.sep, topdown=True):
counter = 1
for dataFromFiles in LoadData(sweepfiles, subfolders):
print()
t_LD = dataFromFiles['TimeMATH']
y_LD = dataFromFiles['VoltsMATH']
fitting_LD = fit_sin_LD(t_LD, y_LD)
t_APD = dataFromFiles['TimeCH4']
y_APD = dataFromFiles['VoltsCH4']
fitting_APD = fit_sin_APD(t_APD, y_APD)
Phi = np.array([[fitting_LD["phase_LD"],fitting_APD["phase_APD"]]])
df = pd.DataFrame(Phi, columns=["phase_LD","phase_APD"])
print(df)
print('_'*50)
while not sweepfiles[counter].endswith('.csv'):
counter = counter + 1
print('The sweepfile is:', sweepfiles[counter])
counter = counter + 1
print('Phase_Shift:', fitting_LD["phase_LD"]-fitting_APD["phase_APD"])
print('='*30)
Plots()
And I have this output:
You are working on this directory:
'XXX/' + DATE + '/' + TEST
\Run_2
The sweeps in the folder are:
\TEST_sweep_1.csv
\TEST_sweep_2.csv
\TEST_sweep_3.csv
\TEST_sweep_4.csv
\TEST_sweep_5.csv
phase_LD phase_APD
0 0.799186 0.787802
__________________________________________________
The sweepfile is: TEST_sweep_1.csv
Phase_Shift: 0.01138438229758243
==============================
phase_LD phase_APD
0 0.826551 0.810993
__________________________________________________
The sweepfile is: TEST_sweep_2.csv
Phase_Shift: 0.015558041120443344
==============================
phase_LD phase_APD
0 0.834952 0.811156
__________________________________________________
The sweepfile is: TEST_sweep_3.csv
Phase_Shift: 0.023795986346148656
==============================
phase_LD phase_APD
0 0.856211 0.842482
__________________________________________________
The sweepfile is: TEST_sweep_4.csv
Phase_Shift: 0.013728505278350567
==============================
phase_LD phase_APD
0 0.856638 0.833881
__________________________________________________
The sweepfile is: TEST_sweep_5.csv
Phase_Shift: 0.022756757048449816
==============================
I would like to obtain a single DataFrame after the loop with all the data collected (concatenated/ appended) so I can easily keep working with less data, I mean something like this:
phase_LD phase_APD
0 0.799186 0.787802
1 0.826551 0.810993
2 0.834952 0.811156
3 0.856211 0.842482
4 0.856638 0.833881
Any tips? Thanks!
Upvotes: 0
Views: 87
Reputation: 149075
I would store the partial dataframes in a list, and then concat them all:
...
elts = []
for root, dirs, files in os.walk(MainFolder, topdown=True):
for subfolders in dirs:
...
for dataFromFiles in LoadData(sweepfiles, subfolders):
...
df = pd.DataFrame(Phi, columns=["phase_LD","phase_APD"])
elts.append(df)
...
final_df = pd.concat(elts)
Upvotes: 2