Reputation: 1427
I have a pandas dataframe that looks like this:
I want to take the log of each value in the dataframe.
So that seemed like no problem at first, and then:
data.apply(lambda x:math.log(x))
returned a type error (cannot convert series to class 'float').
Okay, fine--so, while type checking is often frowned upon, I gave it a shot (also tried casting x to a float, same problem):
isinstance((data['A1BG'][0]), np.float64)
returns true, so I tried:
data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x)
. That ran without any errors, but it didn't change any values in my dataframe.
What am I doing wrong?
Thanks!
Upvotes: 2
Views: 2592
Reputation: 3855
When you do apply
on a dataframe, the apply function will be cast upon a Pandas.Series
not a float (opposing to when you use apply
on a Series). Then instead of math.log
you should use np.log
)
EDIT:
With examples it's always better:
test = pd.DataFrame(columns = ['a','b'])
test.a = np.random.random(5)
test.b = np.random.random(5)
a b
0 0.430111 0.420516
1 0.367704 0.785093
2 0.034130 0.839822
3 0.310254 0.755089
4 0.098302 0.136995
If you try the following, it won't work:
test.apply(lambda x: math.log(x))
TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index a')
But this will do the job:
test.apply(lambda x: np.log(x))
a b
0 -0.843711 -0.866273
1 -1.000476 -0.241953
2 -3.377588 -0.174565
3 -1.170364 -0.280919
4 -2.319708 -1.987811
Upvotes: 1
Reputation: 402263
What happens is that df.apply
returns a pd.Series
object for the lambda to operate over... It basically operates over a Series at a time, not one float at a time.
So, with
data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x)
isinstance(x, np.float64)
is never true (because x
is a pd.Series
type) and so the else is always executed.
To remedy this, you can operate a column at a time, using df.applymap
:
data.applymap(math.log)
Using apply, the solution is similar, but you cannot escape the lambda:
data.apply(lambda x: np.log(x))
Or, alternatively (pd 0.20):
data.transform(lambda x: np.log(x))
Coincidentally, df.applymap
is the fastest, followed by df.apply
and df.transform
.
Upvotes: 2