Mean and standard deviation with multiple dataframes

Question

I have multiple dataframes having the same columns and the same number of observations: For example

d1 = {'ID': ['A','B','C','D'], 'Amount': 
    [1,2,3,4]}
df1 =pd.DataFrame(data=d1)

d2 = {'ID': ['A','B','C','D'], 'Amount': 
    [6,0,1,5]}
df2 =pd.DataFrame(data=d2)

d3 = {'ID': ['A','B','C','D'], 'Amount': 
    [8,1,2,3]}
df3 =pd.DataFrame(data=d3)

I need to drop one column (D) and its corresponding value in each of the dataframes and then, for each variable, calculating the mean and standard deviation. The expected output should be

  avg   std
A   5    ...
B  ...   ...
C  ...   ...

Generally, for one dataframe, I would use drop columns and then I would compute the average using mean() and the standard deviation std().
How can I do this in an easy and fast way with multiple dataframes? (I have at least 10 of them).

jezrael · Accepted Answer

Use concat with remove D in DataFrame.query and aggregate by GroupBy.agg with named aggregations:

df = (pd.concat([df1, df2, df3])
        .query('ID != "D"')
        .groupby('ID')
        .agg(avg=('Amount', 'mean'), std=('Amount', 'std')))
print (df)
    avg       std
ID               
A     5  3.605551
B     1  1.000000
C     2  1.000000

Or remove D in last step by DataFrame.drop:

df = (pd.concat([df1, df2, df3])
        .groupby('ID')
        .agg(avg=('Amount', 'mean'), std=('Amount', 'std'))
        .drop('D'))

Mean and standard deviation with multiple dataframes

Answers (2)

Related Questions