user98235
user98235

Reputation: 906

Calculating the transition probabilities in R

Let's assume that we have the following 4 states: (A, B, C, D)

The table I have has the following format

old   new 
A      B
A      A
B      C
D      B
C      D
.      .
.      .
.      .
.      .

I would like the calculate the following probabilities based on the data given in the table:

P(new=A | old=A)
P(new=B | old=A)
P(new=C | old=A)
P(new=D | old=A)
P(new=A | old=B)
.
.
.
.
P(new=C | old=D)
P(new=D | old=D)

I can do it in a manual way, summing up all the values when each transition happens and dividing by the number of rows, but I was wondering if there's a built-in function in R that calculates those probabilities or at least helps to fasten calculating those probabilities.

Any help/input would be greatly appreciated. If there's no such function, oh well.

Upvotes: 3

Views: 4117

Answers (1)

lmo
lmo

Reputation: 38510

In base R, you could use prop.table on a table object:

transMat <- prop.table(with(df, table(old, new)), 2)
transMat
   new
old          A          B          C          D
  A 0.26315789 0.27272727 0.18181818 0.22222222
  B 0.31578947 0.36363636 0.09090909 0.22222222
  C 0.21052632 0.27272727 0.45454545 0.33333333
  D 0.21052632 0.09090909 0.27272727 0.22222222

Here, the columns sum to 1:

colSums(transMat)
A B C D 
1 1 1 1

edit On further reflection, I think using margin=1 is actually the desired outcome since old (the conditioned variable) is in the rows and because p(A|A) + p(B|A) + p(C|A) + p(D|A) should equal 1. In this scenario,

transMat <- prop.table(with(df, table(old, new)), 1)
transMat
   new
old          A          B          C          D
  A 0.41666667 0.25000000 0.16666667 0.16666667
  B 0.46153846 0.30769231 0.07692308 0.15384615
  C 0.26666667 0.20000000 0.33333333 0.20000000
  D 0.40000000 0.10000000 0.30000000 0.20000000

will work. alternatively, the transpose prop.table(with(df, table(new, old)), 2).

data

set.seed(1234)
df <- data.frame(old=sample(LETTERS[1:4], 50, replace=TRUE),
                 new=sample(LETTERS[1:4], 50, replace=TRUE))

Upvotes: 9

Related Questions