Reputation: 4429
What is the pythonic way to generate a pandas DataFrame using an aggregation function to combine several other DataFrames? (DataFrames are all of the same shape with same headers.)
Aggregation functions I might be interested in are mean, minimum, maximum and user defined functions taking two operands.
For example, this is the way I might generate a DataFrame (max
) which has the maximum element from four other DataFrames (df1,df2,df3,df4
). This is hardly a pattern (idiom) that I can apply in all situations and to calculate the mean would look quite different.
import pandas as pd
df1 = pd.DataFrame({'a' : [1,2,3], 'b': [3,9,5]})
df2 = pd.DataFrame({'a' : [6,2,7], 'b': [3,4,5]})
df3 = pd.DataFrame({'a' : [6,2,11], 'b': [3,4,5]})
df4 = pd.DataFrame({'a' : [6,2,7], 'b': [3,12,5]})
print(df1)
print(df2)
print(df3)
print(df4)
max1 = df1.where(df1 > df2, df2)
max2 = df3.where(df3 > df4, df4)
max = max1.where(max1 > max2, max2)
print(max)
An example of a user defined function taking two operands which I will also want to apply, is the following which is essentially the same as max in this example but will be something more complicated in practice
def myagg(a, b):
return a > b
Upvotes: 1
Views: 64
Reputation: 402922
Concatenate them all with pd.concat
and then groupby
to find max
:
pd.concat([df1, df2, df3, df4], 1).groupby(level=0, axis=1).max()
a b
0 6 3
1 2 12
2 11 5
Compare this to your method which also returns:
max
a b
0 6 3
1 2 12
2 11 5
Note that the column names across every dataframe would have to be the same for this (as well as your current method) to work.
As piR mentioned, the describe
function performs some general aggregations for you:
lodf = [df1, df2, df3, df4]
pd.concat(lodf, keys=range(len(lodf))) \
.groupby(level=1).describe().swaplevel(0, 1, 1).sort_index(1)
25% 50% 75% count max mean min \
a b a b a b a b a b a b a
0 4.75 3.0 6.0 3.0 6.0 3.00 4.0 4.0 6.0 3.0 4.75 3.00 1.0
1 2.00 4.0 2.0 6.5 2.0 9.75 4.0 4.0 2.0 12.0 2.00 7.25 2.0
2 6.00 5.0 7.0 5.0 8.0 5.00 4.0 4.0 11.0 5.0 7.00 5.00 3.0
std
b a b
0 3.0 2.500000 0.000000
1 4.0 0.000000 3.947573
2 5.0 3.265986 0.000000
Upvotes: 2