Reputation: 906
Let's assume that we have the following 4 states: (A, B, C, D)
The table I have has the following format
old new
A B
A A
B C
D B
C D
. .
. .
. .
. .
I would like the calculate the following probabilities based on the data given in the table:
P(new=A | old=A)
P(new=B | old=A)
P(new=C | old=A)
P(new=D | old=A)
P(new=A | old=B)
.
.
.
.
P(new=C | old=D)
P(new=D | old=D)
I can do it in a manual way, summing up all the values when each transition happens and dividing by the number of rows, but I was wondering if there's a built-in function in R that calculates those probabilities or at least helps to fasten calculating those probabilities.
Any help/input would be greatly appreciated. If there's no such function, oh well.
Upvotes: 3
Views: 4117
Reputation: 38510
In base R, you could use prop.table
on a table object:
transMat <- prop.table(with(df, table(old, new)), 2)
transMat
new
old A B C D
A 0.26315789 0.27272727 0.18181818 0.22222222
B 0.31578947 0.36363636 0.09090909 0.22222222
C 0.21052632 0.27272727 0.45454545 0.33333333
D 0.21052632 0.09090909 0.27272727 0.22222222
Here, the columns sum to 1:
colSums(transMat)
A B C D
1 1 1 1
edit On further reflection, I think using margin=1 is actually the desired outcome since old (the conditioned variable) is in the rows and because p(A|A) + p(B|A) + p(C|A) + p(D|A) should equal 1. In this scenario,
transMat <- prop.table(with(df, table(old, new)), 1)
transMat
new
old A B C D
A 0.41666667 0.25000000 0.16666667 0.16666667
B 0.46153846 0.30769231 0.07692308 0.15384615
C 0.26666667 0.20000000 0.33333333 0.20000000
D 0.40000000 0.10000000 0.30000000 0.20000000
will work. alternatively, the transpose prop.table(with(df, table(new, old)), 2)
.
data
set.seed(1234)
df <- data.frame(old=sample(LETTERS[1:4], 50, replace=TRUE),
new=sample(LETTERS[1:4], 50, replace=TRUE))
Upvotes: 9