How can I “translate” a statistical model defined on paper to the computer using R?

Question

I have initially posted this question on stats.stackexchange.com, but it was closed due to being focused on programming. Hopefully, I can get any help here.

I will not put many theoretical details here to make it simple, but my final goal is to implement a Hidden Markov Model using R.

Although I am fine with the theoretical model construction, when I tried to implement it, I realized that I do not know basic things about computational statistics. My question goes into this direction.

Let and be random variables such that and $Y|X \sim \mathcal{N}(x, \sigma^2)$ , with $p \in [0, 1]$ and $\sigma^2 \in [0, +\infty)$ . If $\pi(\cdot)$ denotes distribution, how can I compute

using R?

I mean, what is the exact meaning of these distributions (one discrete and one continuous) multiplication? How can I do this using R? The answer is obviously a function of , but how is it represented in my code?

Is there any change if is also discrete? For instance, , with $q \in [0, 1]$ . How it would affect the implemented code?

I know my questions are not very specific, but I am very lost on how to start. My goal with this question is understanding how I can "translate" what I have written in paper to the computer.

merv · Accepted Answer

Translation

The equations describe how to compute the probability distribution of X given an observation of Y=y and values for parameters p and sigma. Ultimately, you want to implement a function p_X_given_Y that takes a value of Y and returns a probability distribution for X. A good place to start is to implement the two functions used in the RHS of the expression. Something like,

p_X <- function (x, p=0.5) { switch(as.character(x), "0"=p, "1"=1-p, 0) }

p_Y_given_X <- function (y, x, sigma=1) { dnorm(y, x, sd=sigma) }

Note that p and sigma are picked arbitrarily here. These functions can then be used to define the p_X_given_Y function:

p_X_given_Y <- function (y) {
  # numerators: for each x \in X
  ps <- sapply(c("0"=0,"1"=1), 
               function (x) { p_X(x) * p_Y_given_X(y, x) })

  # divide out denominator
  ps / sum(ps)
}

which can be used like:

> p_X_given_Y(y=0)
#         0         1 
# 0.6224593 0.3775407

> p_X_given_Y(y=0.5)
#   0   1 
# 0.5 0.5 

> p_X_given_Y(y=2)
#         0         1 
# 0.1824255 0.8175745

These numbers should make intuitive sense (given p=0.5): Y=0 is more likely to come from X=0, Y=0.5 is equally likely to be X=0 or X=1, etc.. This is only one way of implementing it, where the idea is to return the "distribution of X", which in this case is simply a named numeric vector, where the names ("0", "1") correspond to the support of X, and the values correspond to the probability masses.

Some alternative implementations might be:

a p_X_given_Y(x,y) that also takes a value for x and returns the corresponding probability mass
a p_X_given_Y(y) that returns another function that takes an x argument and returns the corresponding probability mass (i.e., the probability mass function)

How can I “translate” a statistical model defined on paper to the computer using R?

Answers (1)

Translation

Related Questions