pandas read dataframes in a loop and get the mean

Question

I've got several dataframes with the same column size but different row size as df0:

lang,h,H
ar,2,2
en,1,2
es,3,4
id,4,2

and df1:

lang,h,H
ar,2,2
en,2,2
es,2,3

Those dataframes are .csv files stored in a directory, their names are df + i + '.csv', where i goes in range(10). I would like to read all the files, and then make a mean of each columns. So far I tried the following, reading one by one:

df0 = pd.read_csv('df0.csv', index_col='lang')
df1 = pd.read_csv('df1.csv', index_col='lang')

then concat and taking the mean:

df = pd.concat((df0, df1), axis=1).mean(axis=1)

which returns:

ar    2.00
en    1.75
es    3.00
id    3.00
dtype: float64

How may I read all the files stored in the directory in the loop and get the mean of the all columns in the dataframes? In this case I would like a dataframe which contains the h and H columns with their mean values.

EDIT: This is the expected output dataframe:

lang,mean_h,mean_H
ar,2,2
en,1.50,2
es,2.50,3.50
id,4,2

Sarit Adhikari · Accepted Answer

Get all files in a directory using glob module

import glob
myFiles = glob.glob('C://my_folder//*.csv')

loop through each file and add to tuple dfs

dfs = ()
for file in myFiles:
    df = pd.read_csv(file, index_col='lang')
    dfs = dfs + tuple((df,))

Finally concat them and calculate mean

df = pd.concat(dfs,axis=0)
df = df.groupby(df.index).mean()

pandas read dataframes in a loop and get the mean

Answers (1)

Related Questions