Reputation: 399
Following this question: Transition matrix
We use its setup:
#Please use the setup in the following **EDIT** section.
#df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(1,2,3,2,3,5,2,4,5,1), xt1 = c(1,4,2,1,1,4,2,2,2,5))
cusip xt xt1
1 A1 1 1
2 A2 2 4
3 A3 3 2
4 A4 2 1
5 A5 3 1
6 A6 5 4
7 A7 2 2
8 A8 4 2
9 A9 5 2
10 A10 1 5
According to the answers in that post, we can get a transition matrix as follows:
res <- with(df, table(xt, xt1)) ## table() to form transition matrix
res/rowSums(res) ## /rowSums() to normalize by row
# xt1
# xt 1 2 4 5
# 1 0.5000000 0.0000000 0.0000000 0.5000000
# 2 0.3333333 0.3333333 0.3333333 0.0000000
# 3 0.5000000 0.5000000 0.0000000 0.0000000
# 4 0.0000000 1.0000000 0.0000000 0.0000000
# 5 0.0000000 0.5000000 0.5000000 0.0000000
We notice that there is no column 3 because there is no state 3 at time t+1. However in math the transition matrix has to be square. For this situation, we still need a column 3 where [3,3]=1 and other elements=0 (the rule is that for any missing column n or missing row n, we set [n,n]=1 and other elements in that row/column =0) which is as follows:
# xt1
# xt 1 2 3 4 5
# 1 0.5000000 0.0000000 0.0000000 0.0000000 0.5000000
# 2 0.3333333 0.3333333 0.0000000 0.3333333 0.0000000
# 3 0.5000000 0.5000000 1.0000000 0.0000000 0.0000000
# 4 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
# 5 0.0000000 0.5000000 0.0000000 0.5000000 0.0000000
Can I achieve that without writing a messy for loop? Thank you.
EDIT: Please use this dataset instead:
df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(2,2,3,2,3,5,2,4,5,4), xt1 = c(1,4,2,1,1,4,2,3,2,5))
cusip xt xt1
1 A1 2 1
2 A2 2 4
3 A3 3 2
4 A4 2 1
5 A5 3 1
6 A6 5 4
7 A7 2 2
8 A8 4 3
9 A9 5 2
10 A10 4 5
now we have the transition matrix as follows:
res <- with(df, table(xt, xt1))
res/rowSums(res)
xt1
xt 1 2 3 4 5
2 0.50 0.25 0.00 0.25 0.00
3 0.50 0.50 0.00 0.00 0.00
4 0.00 0.00 0.50 0.00 0.50
5 0.00 0.50 0.00 0.50 0.00
Notice that row 1 is missing. Now I want a new row 1 in which [1,1]=1 and other elements =0 (so that this row sums up to 1). Get something like:
xt1
xt 1 2 3 4 5
1 1.00 0.00 0.00 0.00 0.00
2 0.50 0.25 0.00 0.25 0.00
3 0.50 0.50 0.00 0.00 0.00
4 0.00 0.00 0.50 0.00 0.50
5 0.00 0.50 0.00 0.50 0.00
How can I achieve that (add the missing row)?
Upvotes: 0
Views: 116
Reputation: 66819
Here's a way to do it (only looking at the second question posed):
# setup
df = data.frame(
cusip = paste("A", 1:10, sep = ""),
xt = c(2,2,3,2,3,5,2,4,5,4),
xt1 = c(1,4,2,1,1,4,2,3,2,5)
)
df$xt = factor(df$xt, levels=1:5)
df$xt1 = factor(df$xt1, levels=1:5)
# making the transition frequency table
tab = with(df, prop.table(table(xt,xt1), 1))
# xt1
# xt 1 2 3 4 5
# 1
# 2 0.50 0.25 0.00 0.25 0.00
# 3 0.50 0.50 0.00 0.00 0.00
# 4 0.00 0.00 0.50 0.00 0.50
# 5 0.00 0.50 0.00 0.50 0.00
This is the correct table for describing the frequency of transitions observed in the data df
. If, however, you want to impute a transition rule where no data is available, there are some options. The OP wants to impute that any unobserved states are "absorbing states":
r = rowSums(tab,na.rm=TRUE)==0
tab[r, ] <- diag(nrow(tab))[r,,drop=FALSE]
# xt1
# xt 1 2 3 4 5
# 1 1.00 0.00 0.00 0.00 0.00
# 2 0.50 0.25 0.00 0.25 0.00
# 3 0.50 0.50 0.00 0.00 0.00
# 4 0.00 0.00 0.50 0.00 0.50
# 5 0.00 0.50 0.00 0.50 0.00
I don't think this is a good idea, since it is hiding features of the true data.
Upvotes: 1