durbachit
durbachit

Reputation: 4896

Logarithm of a pandas series/dataframe

In short: How can I get a logarithm of a column of a pandas dataframe? I thought numpy.log() should work on it, but it isn't. I suspect it's because I have some NaNs in the dataframe?

My whole code is below. It may seem a bit chaotic, basically my ultimate goal (a little exaggerated) is to plot different rows of different selected columns in several selected columns into several subplots (hence the three embedded for loops iterating between different groups... if you suggest a more elegant solution, I will appreciate it but it is not the main thing that's pressing me). I need to plot a logarithm of some values from one dataframe + 1 versus some values of the other dataframe. And here is the problem, on the plotting line with np.log I get this error: AttributeError: 'float' object has no attribute 'log' (and if I use math instead of np, I get this: TypeError: cannot convert the series to <type 'float'>) What may I do about it?

Thank you. Here is the code:

import numpy as np
import math
import pandas as pd
import matplotlib.pyplot as plt

hf = pd.DataFrame({'Z':np.arange(0,100,1),'A':(10*np.random.rand(100)), 'B':(10*np.random.rand(100)),'C':(10*np.random.rand(100)),'D':(10*np.random.rand(100)),'E':(10*np.random.rand(100)),'F':(10*np.random.rand(100))})
df = pd.DataFrame({'Z':np.arange(0,100,1),'A':(10*np.random.rand(100)), 'B':(10*np.random.rand(100)),'C':(10*np.random.rand(100)),'D':(10*np.random.rand(100)),'E':(10*np.random.rand(100)),'F':(10*np.random.rand(100))})
hf.loc[0:5,'A']=np.nan
df.loc[0:5,'A']=np.nan
hf.loc[53:58,'B']=np.nan
df.loc[53:58,'B']=np.nan
hf.loc[90:,'C']=np.nan
df.loc[90:,'C']=np.nan
I = ['A','B']
II = ['C','D']
III = ['E','F']
IV = ['F','A']
runs = [I,II,III,IV]
inds = [10,20,30,40]

fig = plt.figure(figsize=(6,4))
for r in runs:
    data = pd.DataFrame(index=df.index,columns=r)
    HF = pd.DataFrame(index=hf.index,columns=r)
    #pdb.set_trace()
    for i in r:
        data.loc[:,i] = df.loc[:,i]
        HF.loc[:,i] = hf.loc[:,i]
        for c,z in enumerate(inds):
            ax=fig.add_subplot()
            ax = plt.plot(math.log1p(HF.loc[z]),Tdata.loc[z],linestyle=":",marker="o",markersize=5,label=inds[c].__str__())
# or the other version
#plt.plot(np.log(1 + HF.loc[z]),Tdata.loc[z],linestyle=":",marker="o",markersize=5,label=inds[c].__str__())

As @Jason pointed out, this answer did the trick! Thank you!

Upvotes: 2

Views: 10393

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96349

The problem isn't that you have NaN values, it's that you don't have NaN values, you have strings "NaN" which the ufunc np.log doesn't know how to deal with. Replace the beginning of your code with:

h = {'Z': np.arange(0,100,1), 'A': 10*np.random.rand(100),
     'B': 10*np.random.rand(100), 'C': 10*np.random.rand(100),
     'D': 10*np.random.rand(100), 'E': 10*np.random.rand(100),
     'F': 10*np.random.rand(100)}
hf = pd.DataFrame(h)
f = {'Z': np.arange(0,100,1), 'A': 10*np.random.rand(100),
     'B': 10*np.random.rand(100), 'C': 10*np.random.rand(100),
     'D': 10*np.random.rand(100), 'E': 10*np.random.rand(100),
     'F': 10*np.random.rand(100)}
df = pd.DataFrame(f)
hf.loc[0:5,'A'] = np.nan
df.loc[0:5,'A'] = np.nan
hf.loc[53:58,'B'] = np.nan
df.loc[53:58,'B'] = np.nan
hf.loc[90:,'C'] = np.nan
df.loc[90:,'C'] = np.nan

And everything should work nicely with np.log

Upvotes: 4

Related Questions