Steve
Steve

Reputation: 51

How can I build a good approximation of an unknown distribution when only having samples from it in order to draw from it in torch?

Say I just have random samples from the Distribution and no other data - e.g. a list of numbers - [1,15,30,4,etc.]. What's the best way to estimate the distribution to draw more samples from it in pytorch?

I am currently assuming that all samples come from a Normal distribution and just using the mean and std of the samples to build it and draw from it. The function, however, can be of any distribution.

samples = torch.Tensor([1,2,3,4,3,2,2,1])
Normal(samples.mean(), samples.std()).sample()

Upvotes: 1

Views: 1323

Answers (2)

Szymon Maszke
Szymon Maszke

Reputation: 24815

If you have enough samples (and preferably sample dimension is higher than 1), you could model the distribution using Variational Autoencoder or Generative Adversarial Networks (though I would stick with the first approach as it's simpler).

Basically, after correct implementation and training you would get deterministic decoder able to decode hidden code you would pass it (say vector of size 10 taken from normal distribution) into a value from your target distribution.

Note it might not be reliable at all though, it would be even harder if your samples are 1D only.

Upvotes: 1

Jan
Jan

Reputation: 1240

The best way depends on what you want to achieve. If you don't know the underlying distribution, you need to make assumptions about it and then fit a suitable distribution (that you know how to sample) to your samples. You may start with something simple like a Mixture of Gaussians (several normal distributions with different weightings).

Another way is to define a discrete distribution over the values you have. You will give each value the same probability, say p(x)=1/N. When you sample from it, you simply draw a random integer from [0,N) that points to one of your samples.

Upvotes: 0

Related Questions