user11086563
user11086563

Reputation:

Replace zero valued columns in one data frame with mean values of same name column in another data frame

I have two data frames, df1 and df2, each with same number of columns & same column names, but with different number of rows. Basically, there are many columns in df2 which have all 0 values.

What I would like to accomplish is that all columns in df2 which are zero valued are replaced with the mean (average) value of the same column name (as in df1).

So, if df1 has a structure like:-

Column1 Column2 ------    Column n
0.4      2.3               1.7
0.7      2.5               1.4
0.1      2.1               1.2

and df2 has a structure like:-

Column1 Column2 ------    Column n
0      2.3                1.7
0      2.5               1.4
0      2.1               1.2

I would like to replace column1 (and any other all-zero columns in df2) with the mean of the same column mapped in df1. So, finally, df2 would look like:-

Column1 Column2 ------    Column n
0.4      2.3               1.7
0.4      2.5               1.4
0.4      2.1               1.2

(All zero values in column 1 of df2 replaced with mean of column 1 in df1.

I am fairly new to this and have checked other options such as fillna() and replace(), but am unable to accomplish exactly what I want. Any help in this regard is highly appreciated.

Upvotes: 2

Views: 145

Answers (1)

jezrael
jezrael

Reputation: 862751

Use DataFrame.mask with mean:

df = df2.mask(df2 == 0, df1.mean(), axis=1)
print (df)
   Column1  Column2  Column n
0      0.4      2.3       1.7
1      0.4      2.5       1.4
2      0.4      2.1       1.2

numpy alternative with numpy.where should working faster in large DataFrames:

df = pd.DataFrame(np.where(df2 == 0, df1.mean(), df1), 
                  index=df1.index,
                  columns=df1.columns)
print (df)
   Column1  Column2  Column n
0      0.4      2.3       1.7
1      0.4      2.5       1.4
2      0.4      2.1       1.2

Upvotes: 3

Related Questions