T33C
T33C

Reputation: 4429

Aggregate n DataFrames into a single DataFrame - element-wise

What is the pythonic way to generate a pandas DataFrame using an aggregation function to combine several other DataFrames? (DataFrames are all of the same shape with same headers.)

Aggregation functions I might be interested in are mean, minimum, maximum and user defined functions taking two operands.

For example, this is the way I might generate a DataFrame (max) which has the maximum element from four other DataFrames (df1,df2,df3,df4). This is hardly a pattern (idiom) that I can apply in all situations and to calculate the mean would look quite different.

import pandas as pd

df1 = pd.DataFrame({'a' : [1,2,3], 'b': [3,9,5]})
df2 = pd.DataFrame({'a' : [6,2,7], 'b': [3,4,5]})
df3 = pd.DataFrame({'a' : [6,2,11], 'b': [3,4,5]})
df4 = pd.DataFrame({'a' : [6,2,7], 'b': [3,12,5]})
print(df1)
print(df2)
print(df3)
print(df4)

max1 = df1.where(df1 > df2, df2)
max2 = df3.where(df3 > df4, df4)
max = max1.where(max1 > max2, max2)
print(max)

An example of a user defined function taking two operands which I will also want to apply, is the following which is essentially the same as max in this example but will be something more complicated in practice

def myagg(a, b):
    return a > b

Upvotes: 1

Views: 64

Answers (1)

cs95
cs95

Reputation: 402922

Concatenate them all with pd.concat and then groupby to find max:

pd.concat([df1, df2, df3, df4], 1).groupby(level=0, axis=1).max()

    a   b
0   6   3
1   2  12
2  11   5

Compare this to your method which also returns:

max

    a   b
0   6   3
1   2  12
2  11   5

Note that the column names across every dataframe would have to be the same for this (as well as your current method) to work.


As piR mentioned, the describe function performs some general aggregations for you:

lodf = [df1, df2, df3, df4]

pd.concat(lodf, keys=range(len(lodf))) \
  .groupby(level=1).describe().swaplevel(0, 1, 1).sort_index(1)

    25%       50%       75%       count        max        mean        min  \
      a    b    a    b    a     b     a    b     a     b     a     b    a   
0  4.75  3.0  6.0  3.0  6.0  3.00   4.0  4.0   6.0   3.0  4.75  3.00  1.0   
1  2.00  4.0  2.0  6.5  2.0  9.75   4.0  4.0   2.0  12.0  2.00  7.25  2.0   
2  6.00  5.0  7.0  5.0  8.0  5.00   4.0  4.0  11.0   5.0  7.00  5.00  3.0   

             std            
     b         a         b  
0  3.0  2.500000  0.000000  
1  4.0  0.000000  3.947573  
2  5.0  3.265986  0.000000  

Upvotes: 2

Related Questions