Reputation: 473
Suppose I have three dataframes with the same index and same columns names (some columns are strings others are numeric): df1, df2, df3.
df1
Hour | X1 | X2 | X3 | X4
0 | 10 | H | 2 | -1
1 | 10 | b | 1 | 0
2 | 10 | D | 1 | 0
3 | 20 | L | 1 | 0
df2
Hour | X1 | X2 | X3 | X4
0 | 10 | H | 1 | 1
1 | 10 | b | 1 | 0
2 | 10 | D | 1 | 0
3 | 20 | L | 2 | 0
df3
Hour | X1 | X2 | X3 | X4
0 | 10 | H | 0 | 0
1 | 10 | b | 0 | 0
2 | 10 | D | 0 | 0
3 | 20 | L | 0 | 30
I want a new dataframe that is the average of these dataframes for numeric values but keep the string colum as is (all string columns are the same).
df4 =
Hour | X1 | X2 | X3 | X4
0 | 10 | H | 1 | 0
1 | 10 | b | 1 | 0
2 | 10 | D | 1 | 0
3 | 20 | L | 1 | 10
I prefer to do it using add function as I have to write a for loop to access each data frame. Thanks!
Upvotes: 1
Views: 111
Reputation: 164623
You can use select_dtypes
to separate numeric and non-numeric series. Then use pd.concat
with groupby
+ mean
. Finally, join fixed series and reindex columns.
fixed = df1.select_dtypes(include=['object'])
numcols = df1.columns.difference(fixed.columns)
res = pd.concat([df1, df2, df3]).reindex(columns=numcols)\
.groupby(level=0).mean()\
.join(fixed).reindex(columns=df1.columns)
print(res)
Hour X1 X2 X3 X4
0 0.0 10.0 H 1.000000 0.0
1 1.0 10.0 b 0.666667 0.0
2 2.0 10.0 D 0.666667 0.0
3 3.0 20.0 L 1.000000 10.0
Upvotes: 2
Reputation: 323226
Doing with concat
then agg
pd.concat([df1,df2,df3]).groupby(level=0).agg(lambda x : x.mean() if x.dtype=='int64' else x.head(1))
Upvotes: 4