Reputation: 704
I have multiple dataframes having the same columns and the same number of observations: For example
d1 = {'ID': ['A','B','C','D'], 'Amount':
[1,2,3,4]}
df1 =pd.DataFrame(data=d1)
d2 = {'ID': ['A','B','C','D'], 'Amount':
[6,0,1,5]}
df2 =pd.DataFrame(data=d2)
d3 = {'ID': ['A','B','C','D'], 'Amount':
[8,1,2,3]}
df3 =pd.DataFrame(data=d3)
I need to drop one column (D) and its corresponding value in each of the dataframes and then, for each variable, calculating the mean and standard deviation. The expected output should be
avg std
A 5 ...
B ... ...
C ... ...
Generally, for one dataframe, I would use drop columns and then I would compute the average using mean() and the standard deviation std().
How can I do this in an easy and fast way with multiple dataframes? (I have at least 10 of them).
Upvotes: 3
Views: 1817
Reputation: 1875
You can use pivot_table as well:
import numpy as np
pd.concat([df1, df2, df3]).pivot_table(index='ID', aggfunc=[np.mean, np.std]).drop('D')
Upvotes: 3
Reputation: 863751
Use concat
with remove D
in DataFrame.query
and aggregate by GroupBy.agg
with named aggregations:
df = (pd.concat([df1, df2, df3])
.query('ID != "D"')
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std')))
print (df)
avg std
ID
A 5 3.605551
B 1 1.000000
C 2 1.000000
Or remove D
in last step by DataFrame.drop
:
df = (pd.concat([df1, df2, df3])
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std'))
.drop('D'))
Upvotes: 5