LdM
LdM

Reputation: 704

Mean and standard deviation with multiple dataframes

I have multiple dataframes having the same columns and the same number of observations: For example

d1 = {'ID': ['A','B','C','D'], 'Amount': 
    [1,2,3,4]}
df1 =pd.DataFrame(data=d1)

d2 = {'ID': ['A','B','C','D'], 'Amount': 
    [6,0,1,5]}
df2 =pd.DataFrame(data=d2)

d3 = {'ID': ['A','B','C','D'], 'Amount': 
    [8,1,2,3]}
df3 =pd.DataFrame(data=d3)

I need to drop one column (D) and its corresponding value in each of the dataframes and then, for each variable, calculating the mean and standard deviation. The expected output should be

  avg   std
A   5    ...
B  ...   ...
C  ...   ...

Generally, for one dataframe, I would use drop columns and then I would compute the average using mean() and the standard deviation std().
How can I do this in an easy and fast way with multiple dataframes? (I have at least 10 of them).

Upvotes: 3

Views: 1817

Answers (2)

Arkadiusz
Arkadiusz

Reputation: 1875

You can use pivot_table as well:

import numpy as np

pd.concat([df1, df2, df3]).pivot_table(index='ID', aggfunc=[np.mean, np.std]).drop('D')

Upvotes: 3

jezrael
jezrael

Reputation: 863751

Use concat with remove D in DataFrame.query and aggregate by GroupBy.agg with named aggregations:

df = (pd.concat([df1, df2, df3])
        .query('ID != "D"')
        .groupby('ID')
        .agg(avg=('Amount', 'mean'), std=('Amount', 'std')))
print (df)
    avg       std
ID               
A     5  3.605551
B     1  1.000000
C     2  1.000000

Or remove D in last step by DataFrame.drop:

df = (pd.concat([df1, df2, df3])
        .groupby('ID')
        .agg(avg=('Amount', 'mean'), std=('Amount', 'std'))
        .drop('D'))

Upvotes: 5

Related Questions