phhu
phhu

Reputation: 1992

How can I write an arbitrary continuous distribution in Julia, or at least simulate sampling from one?

Suppose I have an arbitrary probability distribution function (PDF) defined as a function f, for example:

using Random, Distributions

#// PDF with parameter θ ϵ (0,1), uniform over 
#// segments [-1,0) and [0,1], zero elsewhere
f = θ -> x -> 
  (-1 <= x < 0) ? θ^2 :
  (0 <= x <= 1) ? 1-θ^2 :
  0

How can I sample values from a random variable with this PDF in Julia? (or alternatively, how can I at least simulate sampling from such a random variable?)

i.e. I want the equivalent of rand(Normal(),10) for 10 values from a (standard) normal disribution, but I want to use the function f to define the distribution used (something like rand(f(0.4),10) - but this doesn't work)

(This is already an answer for discrete distributions at How can I write an arbitrary discrete distribution in Julia?: however I'm wanting to use a continuous distribution. There are some details of creating a sampler at https://juliastats.org/Distributions.jl/v0.14/extends.html which I think might be useful, but I don't understand how to apply them. Also in R I've used the inverse CDF technique as described at https://blogs.sas.com/content/iml/2013/07/22/the-inverse-cdf-method.html for simulating such random variables, but am unsure how it might best be implemented in Julia.)

Upvotes: 2

Views: 532

Answers (1)

Colin T Bowers
Colin T Bowers

Reputation: 18580

First problem is that what you've provided is not a complete specification of a probability distribution since it doesn't say anything about the distribution within the interval [-1, 0) or within the interval [0, 1]. So for the purposes of this answer I'm going to assume your probability distribution function is uniform on each of these intervals. Given this, then I would argue the most Julian way to implement your own distribution would be to create a new subtype, in this case, of ContinuousUnivariateDistribution. Example code follows:

using Distributions
struct MyDistribution <: ContinuousUnivariateDistribution
    theta::Float64
    function MyDistribution(theta::Float64)
        !(0 <= theta <= 1) && error("Invalid theta: $(theta)")
        new(theta)
    end
end
function Distributions.rand(d::MyDistribution)::Float64
    if rand() < d.theta^2
        x = rand() - 1
    else
        x = rand()
    end
    return x
end
function Distributions.quantile(d::MyDistribution, p::Real)::Float64
    !(0 <= p <= 1) && error("Invalid probability input: $(p)")
    if p < d.theta^2
        x = -1.0 + (p / d.theta^2)
    else
        x = (p - d.theta^2) / (1 - d.theta^2)
    end
    return x
end

In the above code I have implemented a rand and quantile method for the new distribution which is the minimum to be able to make function calls like rand(MyDistribution(0.4), 20) to sample 20 random numbers from your new distribution. See here for a list of other methods you may want to add to your new distribution type (depending on your use-case perhaps you won't bother).

Note, if efficiency is an issue, you may look into some of the methods that will allow you to minimise the number of d.theta^2 operations, e.g. Distributions.sampler. Alternatively, you could just store theta^2 internally in MyDistribution but always display the underlying theta. Up to you really.

Finally, you don't really need type annotations on function outputs. I've just included them for clarity.

Upvotes: 3

Related Questions