Jan Erst
Jan Erst

Reputation: 89

Scipy-stats, how to make it faster/is it possible?

I'm currently using the following code:

prob = scipy.stats.norm(mu, np.sqrt(sigma)).pdf(o)  # korrekt
    return prob

I don't think there is much use for me to explain the variables, but mu is the expected values, sigma is my variance and o is an observation, and I want to find the probability of the given observation. It is working, but it is very slow since I call it a lot of times, and I got a lot faster result from just writing the normal distribution and attaining the probability from that.

My question:

Is there a smarter way for me to call this function?

Upvotes: 0

Views: 1024

Answers (1)

slackline
slackline

Reputation: 2417

Two approaches...

Vectorisation

Takes advantage of the fact scipy/numpy perform calculations on arrays...

import numpy as np
from scipy.stats import norm

observations = np.random.rand(1000)
mu = np.mean(observations)
sigma = np.var(observations)
norm(mu, np.sqrt(sigma)).pdf(observations)

List Comprehension

This is a lot slower, but if your observations are in a list then you can...

list_of_observations = list(np.random.rand(1000))
mu = np.mean(list_of_observations)
sigma = np.var(list_of_observations)
prob = [norm(mu, np.sqrt(sigma)).pdf(o) for o in list_of_observations]

...but its easy to convert a list to an array and use the former solution as you can use np.asarray() to convert the list to an array...

norm(mu, np.sqrt(sigma)).pdf(np.asarray(list_of_observations))

Note also that if you are calculating the variance (sigma) yourself then rather than using np.var() you can get the standard deviation directly using np.std().

Upvotes: 1

Related Questions