Reputation: 21574
I've got several dataframes with the same column size but different row size as df0:
lang,h,H
ar,2,2
en,1,2
es,3,4
id,4,2
and df1:
lang,h,H
ar,2,2
en,2,2
es,2,3
Those dataframes are .csv files stored in a directory, their names are df + i + '.csv'
, where i goes in range(10). I would like to read all the files, and then make a mean of each columns. So far I tried the following, reading one by one:
df0 = pd.read_csv('df0.csv', index_col='lang')
df1 = pd.read_csv('df1.csv', index_col='lang')
then concat and taking the mean:
df = pd.concat((df0, df1), axis=1).mean(axis=1)
which returns:
ar 2.00
en 1.75
es 3.00
id 3.00
dtype: float64
How may I read all the files stored in the directory in the loop and get the mean of the all columns in the dataframes? In this case I would like a dataframe which contains the h and H columns with their mean values.
EDIT: This is the expected output dataframe:
lang,mean_h,mean_H
ar,2,2
en,1.50,2
es,2.50,3.50
id,4,2
Upvotes: 0
Views: 1308
Reputation: 1402
Get all files in a directory using glob module
import glob
myFiles = glob.glob('C://my_folder//*.csv')
loop through each file and add to tuple dfs
dfs = ()
for file in myFiles:
df = pd.read_csv(file, index_col='lang')
dfs = dfs + tuple((df,))
Finally concat them and calculate mean
df = pd.concat(dfs,axis=0)
df = df.groupby(df.index).mean()
Upvotes: 1