KLM117
KLM117

Reputation: 467

Performing operations on column with nan's without removing them

I currently have a data frame like so:

treated control
9.5 9.6
10 5
6 0
6 6

I want to apply get a log 2 ratio between treated and control i.e log2(treated/control). However, the math.log2() ratio breaks, due to 0 values in the control column (a zero division). Ideally, I would like to get the log 2 ratio using method chaining, e.g a df.assign() and simply put nan's where it is not possible, like so:

treated control log_2_ratio
9.5 9.6 -0.00454
10 5 0.301
6 0 nan
6 6 0

I have managed to do this in an extremely round-about way, where I have:

As always, any help is very much appreciated :)

Upvotes: 1

Views: 484

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

To avoid subsequent replacement you may go through an explicit condition (bearing in mind that multiplication/division operation with zero always result in 0).

df.assign(log_2_ratio=lambda x: np.where(x.treated * x.control, np.log2(x.treated/x.control), np.nan))

Out[22]: 
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000

Upvotes: 1

tdelaney
tdelaney

Reputation: 77347

Stick with the numpy log functions and you'll get an inf in the cells where the divide doesn't work. That seems like a better choice than nan anyway.

>>> df["log_2_ratio"] = np.log2(df.treated/df.control)
>>> df
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          inf
3      6.0      6.0     0.000000

Upvotes: 0

mozway
mozway

Reputation: 260835

You need to replace the inf with nan:

df.assign(log_2_ratio=np.log2(df['treated'].div(df['control'])).replace(np.inf, np.nan))

Output:

   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000

Upvotes: 1

Related Questions