Elham
Elham

Reputation: 867

Nan values in columns in python

I have a data set which is created based on other data set. In my new data fame some columns have nan values. I want to make a log on each columns. However I need all the rows even though they have Nan values. What should I do with Nan values before applying log? For example consider the following data set:

a    b     c
1    2     3
4    5     6
7    nan   8
9    nan   nan

I do not want to drop the rows with nan values. I need them for applying log on them.

I need to have the values of 7 and 8 in the row 6 for example. Thanks.

Upvotes: 0

Views: 423

Answers (1)

piRSquared
piRSquared

Reputation: 294198

Having nan won't affect log when calculating for each individual cell. What's more is that np.log has the property that it will operate on a pd.DataFrame and return a pd.DataFrame

np.log(df)

          a         b         c
0  0.000000  0.693147  1.098612
1  1.386294  1.609438  1.791759
2  1.945910       NaN  2.079442
3  2.197225       NaN       NaN

Notice the difference in timing

%timeit np.log(df)
%timeit pd.DataFrame(np.log(df.values), df.index, df.columns)
%timeit df.applymap(np.log)

134 µs ± 5.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
107 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
835 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Response to @IanS

Notice the subok=True parameter in the documentation

It controls whether the original type is preserved. If we turn it to False

np.log(df, subok=False)

array([[ 0.        ,  0.69314718,  1.09861229],
       [ 1.38629436,  1.60943791,  1.79175947],
       [ 1.94591015,         nan,  2.07944154],
       [ 2.19722458,         nan,         nan]])

Upvotes: 3

Related Questions