pythonnewbie
pythonnewbie

Reputation: 495

Sampling methods

Can you help me out with these questions? I'm using Python

Sampling Methods

Sampling (or Monte Carlo) methods form a general and useful set of techniques that use random numbers to extract information about (multivariate) distributions and functions. In the context of statistical machine learning, we are most often concerned with drawing samples from distributions to obtain estimates of summary statistics such as the mean value of the distribution in question.

When we have access to a uniform (pseudo) random number generator on the unit interval (rand in Matlab or runif in R) then we can use the transformation sampling method described in Bishop Sec. 11.1.1 to draw samples from more complex distributions. Implement the transformation method for the exponential distribution

$$p(y) = \lambda \exp(−\lambda y) , y \geq 0$$

using the expressions given at the bottom of page 526 in Bishop: Slice sampling involves augmenting z with an additional variable u and then drawing samples from the joint (z,u) space.

The crucial point of sampling methods is how many samples are needed to obtain a reliable estimate of the quantity of interest. Let us say we are interested in estimating the mean, which is

$$\mu_y = 1/\lambda$$

in the above distribution, we then use the sample mean

$$b_y = \frac1L \sum^L_{\ell=1} y(\ell)$$

of the L samples as our estimator. Since we can generate as many samples of size L as we want, we can investigate how this estimate on average converges to the true mean. To do this properly we need to take the absolute difference

$$|\mu_y − b_y|$$

between the true mean $µ_y$ and estimate $b_y$ averaged over many, say 1000, repetitions for several values of $L$, say 10, 100, 1000. Plot the expected absolute deviation as a function of $L$. Can you plot some transformed value of expected absolute deviation to get a more or less straight line and what does this mean?

I'm new to this kind of statistical machine learning and really don't know how to implement it in Python. Can you help me out?

Upvotes: 2

Views: 3758

Answers (1)

Cam.Davidson.Pilon
Cam.Davidson.Pilon

Reputation: 1716

There are a few shortcuts you can take. Python has some built-in methods to do sampling, mainly in the Scipy library. I can recommend a manuscript that implements this idea in Python (disclaimer: I am the author), located here.

It is part of a larger book, but this isolated chapter deals with the more general Law of Large Numbers + convergence, which is what you are describing. The paper deals with Poisson random variables, but you should be able to adapt the code to your own situation.

Upvotes: 1

Related Questions