Reputation: 5822
For the following dataframe, I'm trying to add a new column called pdf
and calculate pdf for every single value of it:
df=
id/uniqueID data mean std
5171/0 10.0 2.8 0.0
5171/1 40.9 2.5 3.4
5171/2 60.7 3.1 5.2
...
5171/57 0.5 1.3 5.1
4567/0 1.5 2.0 1.0
4567/1 4.4 2.0 1.3
4567/2 6.3 3.0 1.5
...
4567/57 0.7 1.4 1.6
...
9584/0 0.3 2.6 0.0
9584/1 0.5 1.2 8.3
9584/2 0.7 3.0 5.6
...
9584/57 0.7 1.3 0.1
Here is how I tried to do it:
idxs = unique(df.index).tolist()
df['pdf'] = None
for idx in idxs:
df['pdf'].loc[idx] = norm(loc=df['mean'].loc[idx], scale=df['std'].loc[idx]).pdf(df['data'].loc[idx])
which gives me this error: {TypeError} ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule "safe"
It works if I calculate the pdf() for one single value:
norm(loc=df['mean'].loc[idx][0], scale=df['std'].loc[idx][0]).pdf(df['data'].loc[idx][0])
what is the problem and how I can fix it?
Upvotes: 0
Views: 2269
Reputation: 88226
You can apply
the above setting axis
to 1
:
from scipy.stats import norm
df.apply(lambda x: norm(loc=x['mean'], scale=x['std']).pdf(x['data']), axis=1)
0 NaN
1 2.348336e-29
2 1.743114e-28
3 7.726749e-02
4 3.520653e-01
5 5.582995e-02
6 2.364973e-02
7 2.265827e-01
8 NaN
9 4.789470e-02
10 6.547753e-02
11 6.075883e-08
dtype: float64
Upvotes: 2