I currently have a data frame like so: treated control 9.5 9.6 10 5 6 0 6 6 I want to apply get a log 2 ratio between treated and control i.e log2(treated/control) . However, the math.log2() ratio breaks, due to 0 values in the control column (a zero division). Ideally, I would like to get the log 2 ratio using method chaining, e.g a df.assign() and simply put nan's where it is not possible, like so: treated control log_2_ratio 9.5 9.6 -0.00454 10 5 0.301 6 0 nan 6 6 0 I have managed to do this in an extremely round-about way, where I have: made a column ratio which is treated/control done new_df = df.dropna() on this dataframe applied the log 2 ratio to this. Left joined it back to it's the original df. As always, any help is very much appreciated :)

Reputation: 467

Performing operations on column with nan's without removing them

I currently have a data frame like so:

treated	control
9.5	9.6
10	5
6	0
6	6

I want to apply get a log 2 ratio between treated and control i.e log2(treated/control). However, the math.log2() ratio breaks, due to 0 values in the control column (a zero division). Ideally, I would like to get the log 2 ratio using method chaining, e.g a df.assign() and simply put nan's where it is not possible, like so:

treated	control	log_2_ratio
9.5	9.6	-0.00454
10	5	0.301
6	0	nan
6	6	0

I have managed to do this in an extremely round-about way, where I have:

made a column ratio which is treated/control
done new_df = df.dropna() on this dataframe
applied the log 2 ratio to this.
Left joined it back to it's the original df.

As always, any help is very much appreciated :)

Upvotes: 1

Answers (3)

RomanPerekhrest

Reputation: 92854

To avoid subsequent replacement you may go through an explicit condition (bearing in mind that multiplication/division operation with zero always result in 0).

df.assign(log_2_ratio=lambda x: np.where(x.treated * x.control, np.log2(x.treated/x.control), np.nan))

Out[22]: 
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000

Upvotes: 1

tdelaney

Reputation: 77347

Stick with the numpy log functions and you'll get an inf in the cells where the divide doesn't work. That seems like a better choice than nan anyway.

>>> df["log_2_ratio"] = np.log2(df.treated/df.control)
>>> df
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          inf
3      6.0      6.0     0.000000

Upvotes: 0

mozway

Reputation: 260835

You need to replace the inf with nan:

df.assign(log_2_ratio=np.log2(df['treated'].div(df['control'])).replace(np.inf, np.nan))

Output:

   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000

Upvotes: 1

Performing operations on column with nan&#39;s without removing them

Answers (3)

Related Questions

Performing operations on column with nan's without removing them