Birish
Birish

Reputation: 5822

scipy.stats.norm.pdf() for an entire dataframe

For the following dataframe, I'm trying to add a new column called pdf and calculate pdf for every single value of it:

df= 
        id/uniqueID       data   mean    std   
        5171/0            10.0    2.8     0.0   
        5171/1            40.9    2.5     3.4   
        5171/2            60.7    3.1     5.2   
        ...
        5171/57           0.5     1.3     5.1   
        4567/0            1.5     2.0     1.0   
        4567/1            4.4     2.0     1.3   
        4567/2            6.3     3.0     1.5   
        ...
        4567/57           0.7     1.4     1.6   
       ... 
        9584/0            0.3     2.6     0.0   
        9584/1            0.5     1.2     8.3   
        9584/2            0.7     3.0     5.6   
        ...
        9584/57           0.7     1.3     0.1   

Here is how I tried to do it:

idxs = unique(df.index).tolist()
df['pdf'] = None

for idx in idxs:
   df['pdf'].loc[idx] = norm(loc=df['mean'].loc[idx], scale=df['std'].loc[idx]).pdf(df['data'].loc[idx])

which gives me this error: {TypeError} ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule "safe"

It works if I calculate the pdf() for one single value:

norm(loc=df['mean'].loc[idx][0], scale=df['std'].loc[idx][0]).pdf(df['data'].loc[idx][0])

what is the problem and how I can fix it?

Upvotes: 0

Views: 2269

Answers (1)

yatu
yatu

Reputation: 88226

You can apply the above setting axis to 1:

from scipy.stats import norm

df.apply(lambda x: norm(loc=x['mean'], scale=x['std']).pdf(x['data']), axis=1)

0              NaN
1     2.348336e-29
2     1.743114e-28
3     7.726749e-02
4     3.520653e-01
5     5.582995e-02
6     2.364973e-02
7     2.265827e-01
8              NaN
9     4.789470e-02
10    6.547753e-02
11    6.075883e-08
dtype: float64

Upvotes: 2

Related Questions