mpy
mpy

Reputation: 632

Column wise mean with condition pandas

I have a sample dataframe as follow:

data = pd.DataFrame({'Date':[20210101,20210102,20210103,20210104,20210105],'coef1':[1,2,5,4,3],'coef2':[1,1,2,6,3],'coef3':[1,1,1,1,1]})

enter image description here

I would like to have mean over 'coef1' ,'coef2' and 'coef3' if these values does not equal to 1.

My desired dataframe should be like bellow: enter image description here

I wrote a function and apply it on my datframe and got my desired output,however I want a pythonic way to achieve this.

def final_coef(x):
coef_list = []
if x['coef1'] == 1:
    pass
else:
    coef_list.append(x['coef1'])
if x['coef2'] == 1:
    pass
else:
    coef_list.append(x['coef2'])   
if x['coef3'] == 1:
    pass
else:
     coef_list.append(x['coef3'])                       
return np.mean(coef_list)

data['Final_coef'] = data.apply(lambda row: final_coef(row),axis = 1)

Upvotes: 3

Views: 445

Answers (2)

ggaurav
ggaurav

Reputation: 1804

data['final_coef'] = (
    data[['coef1', 'coef2', 'coef3']][data[['coef1', 'coef2', 'coef3']] != 1]
    .mean(axis=1)
)
data
     Date   coef1 coef2 coef3 final_coef
0   20210101    1   1    1    NaN
1   20210102    2   1    1    2.0
2   20210103    5   2    1    3.5
3   20210104    4   6    1    5.0
4   20210105    3   3    1    3.0

Explanation:

This will generate the required mask. Note that Date field is no where required to be considered

data[['coef1', 'coef2', 'coef3']] != 1

    coef1   coef2   coef3
0   False   False   False
1   True    False   False
2   True    True    False
3   True    True    False
4   True    True    False

Then you can get the required corresponding data points in following way:

data[['coef1', 'coef2', 'coef3']][data[['coef1', 'coef2', 'coef3']] != 1]

   coef1 coef2  coef3
0   NaN   NaN   NaN
1   2.0   NaN   NaN
2   5.0   2.0   NaN
3   4.0   6.0   NaN
4   3.0   3.0   NaN

Upvotes: 2

mosc9575
mosc9575

Reputation: 6337

This can be done in one line but there are three steps to do:

  1. apply np.nan everywhere a value is equal to 1 using df.where(df.ne(1), np.nan)
  2. calculate the mean of each row (NaNs arn't included) using df[['coef1', 'coef2', 'coef3']].mean(axis=1))
  3. assign the results of this calcualtion to a new colum using df.assign()

code example

df  = df.assign(final_coef=df.where(df.ne(1), np.nan)[['coef1', 'coef2', 'coef3']].mean(axis=1))

>>>df
       Date  coef1  coef2  coef3  final_coef
0  20210101      1      1      1         NaN
1  20210102      2      1      1         2.0
2  20210103      5      2      1         3.5
3  20210104      4      6      1         5.0
4  20210105      3      3      1         3.0

Upvotes: 2

Related Questions