Reputation: 21
I have several DataFrames (DataFrames have the same index and column structure). The problem is that there are NaN values in these dataframes.
I want to replace these NaN values by mean value of other's DataFrames' corresponding values.
For exapmle let's look at 3 dataframes.
DataFrame1 with 1:M2
NaN :
M1 M2 M3
0 1 1 2
1 8 NaN 9
2 4 2 7
3 9 6 3
DataFrame 2 with NaN value at 0:M3
:
M1 M2 M3
0 2 3 NaN
1 1 1 6
2 1 2 9
3 4 6 2
DataFrame3:
M1 M2 M3
0 1 4 2
1 2 9 1
2 1 6 5
3 1 NaN 4
So we replace NaN in first DataFrame by 5 (9+1)/2
. Second NaN should be replaced by 2 because (2+2)/2
, third by 6 and so on.
Is there any good and elegant way to do it?
Upvotes: 1
Views: 106
Reputation: 164623
This is one way using numpy.nanmean
.
avg = np.nanmean([df1.values, df2.values, df3.values], axis=0)
for df in [df1, df2, df3]:
df[df.isnull()] = avg
df = df.astype(int)
Note: since np.nan
is float
, we convert explicitly back to int
.
Upvotes: 1
Reputation: 323226
We can concat
, then using groupby
fillna
, after split should get what you need
s=pd.concat([df1,df2,df3],keys=[1,2,3])
s=s.groupby(level=1).apply(lambda x : x.fillna(x.mean()))
df1,df2,df3=[x.reset_index(level=0,drop=True) for _,x in s.groupby(level=0)]
df1
Out[1737]:
M1 M2 M3
0 1 1.0 2.0
1 8 5.0 9.0
2 4 2.0 7.0
3 9 6.0 3.0
Upvotes: 1