How to calculate the mean and standard deviation of multiple dataframes at one go?

Question

I've several hundreds of pandas dataframes and And the number of rows are not exactly the same in all the dataframes like some have 600 but other have 540 only.

So what i want to do is like, i have two samples of exactly the same numbers of dataframes and i want to read all the dataframes(around 2000) from both the samples. So that's how thee data looks like and i can read the files like this:

5113.440  1     0.25846     0.10166    27.96867     0.94852    -0.25846   268.29305     5113.434129
5074.760  3     0.68155     0.16566   120.18771     3.02654    -0.68155   101.02457     5074.745627
5083.340  2     0.74771     0.13267   105.59355     2.15700    -0.74771   157.52406     5083.337081
5088.150  1     0.28689     0.12986    39.65747     2.43339    -0.28689   164.40787     5088.141849
5090.780  1     0.61464     0.14479    94.72901     2.78712    -0.61464   132.25865     5090.773443

#first Sample
path_to_files = '/home/Desktop/computed_2d_blaze/'
lst = []
for filen in [x for x in os.listdir(path_to_files) if '.ares' in x]:
   df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
   df = df.sort_values('stlines', ascending=False)
   df = df.drop_duplicates('wave')
   df = df.reset_index(drop=True)
   lst.append(df)


#second sample

path_to_files1 = '/home/Desktop/computed_1d/'
lst1 = []
for filen in [x for x in os.listdir(path_to_files1) if '.ares' in x]:
   df1 = pd.read_table(path_to_files1+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
   df1 = df1.sort_values('stlines', ascending=False)
   df1 = df1.drop_duplicates('wave')
   df1 = df1.reset_index(drop=True)
   lst1.append(df1)

Now the data is stored in lists and as the number of rows in all the dataframes are not same so i cant subtract them directly.

So how can i subtract them correctly?? And after that i want to take average(mean) of the residual to make a dataframe?

PMende · Accepted Answer

You shouldn't use apply. Just use Boolean making:

mask = df['waves'].between(lower_outlier, upper_outlier)
df[mask].plot(x='waves', y='stlines')

How to calculate the mean and standard deviation of multiple dataframes at one go?

Answers (2)

Related Questions