Reputation: 867
I have a data set which is created based on other data set. In my new data fame some columns have nan values. I want to make a log on each columns. However I need all the rows even though they have Nan values. What should I do with Nan values before applying log? For example consider the following data set:
a b c
1 2 3
4 5 6
7 nan 8
9 nan nan
I do not want to drop the rows with nan values. I need them for applying log on them.
I need to have the values of 7 and 8 in the row 6 for example. Thanks.
Upvotes: 0
Views: 423
Reputation: 294198
Having nan
won't affect log when calculating for each individual cell. What's more is that np.log
has the property that it will operate on a pd.DataFrame
and return a pd.DataFrame
np.log(df)
a b c
0 0.000000 0.693147 1.098612
1 1.386294 1.609438 1.791759
2 1.945910 NaN 2.079442
3 2.197225 NaN NaN
Notice the difference in timing
%timeit np.log(df)
%timeit pd.DataFrame(np.log(df.values), df.index, df.columns)
%timeit df.applymap(np.log)
134 µs ± 5.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
107 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
835 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Response to @IanS
Notice the subok=True
parameter in the documentation
It controls whether the original type is preserved. If we turn it to False
np.log(df, subok=False)
array([[ 0. , 0.69314718, 1.09861229],
[ 1.38629436, 1.60943791, 1.79175947],
[ 1.94591015, nan, 2.07944154],
[ 2.19722458, nan, nan]])
Upvotes: 3