user2512443
user2512443

Reputation: 473

Average of Dataframes

Suppose I have three dataframes with the same index and same columns names (some columns are strings others are numeric): df1, df2, df3.

df1

Hour | X1 | X2 | X3 | X4
 0   | 10 | H | 2   | -1  
 1   | 10 | b | 1   | 0 
 2   | 10 | D | 1   | 0 
 3   | 20 | L | 1   | 0

df2

Hour | X1 | X2 | X3 | X4
 0   | 10 | H | 1   | 1  
 1   | 10 | b | 1   | 0 
 2   | 10 | D | 1   | 0 
 3   | 20 | L | 2   | 0

df3

Hour | X1 | X2 | X3 | X4
 0   | 10 | H  | 0  |  0  
 1   | 10 | b  | 0  |  0 
 2   | 10 | D  | 0  |  0 
 3   | 20 | L  | 0  |  30

I want a new dataframe that is the average of these dataframes for numeric values but keep the string colum as is (all string columns are the same).

df4 =

Hour | X1 | X2 | X3 | X4
 0   | 10 | H | 1 | 0  
 1   | 10 | b | 1 | 0 
 2   | 10 | D | 1 | 0 
 3   | 20 | L | 1 | 10

I prefer to do it using add function as I have to write a for loop to access each data frame. Thanks!

Upvotes: 1

Views: 111

Answers (2)

jpp
jpp

Reputation: 164623

You can use select_dtypes to separate numeric and non-numeric series. Then use pd.concat with groupby + mean. Finally, join fixed series and reindex columns.

fixed = df1.select_dtypes(include=['object'])
numcols = df1.columns.difference(fixed.columns)

res = pd.concat([df1, df2, df3]).reindex(columns=numcols)\
        .groupby(level=0).mean()\
        .join(fixed).reindex(columns=df1.columns)

print(res)

   Hour    X1   X2        X3     X4
0    0.0  10.0   H   1.000000   0.0
1    1.0  10.0   b   0.666667   0.0
2    2.0  10.0   D   0.666667   0.0
3    3.0  20.0   L   1.000000  10.0

Upvotes: 2

BENY
BENY

Reputation: 323226

Doing with concat then agg

pd.concat([df1,df2,df3]).groupby(level=0).agg(lambda x : x.mean() if x.dtype=='int64' else x.head(1))

Upvotes: 4

Related Questions