Natalia
Natalia

Reputation: 399

R: add missing rows not using for loop

Following this question: Transition matrix

We use its setup:

#Please use the setup in the following **EDIT** section.
#df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(1,2,3,2,3,5,2,4,5,1), xt1 = c(1,4,2,1,1,4,2,2,2,5))
   cusip xt xt1
1     A1  1   1
2     A2  2   4
3     A3  3   2
4     A4  2   1
5     A5  3   1
6     A6  5   4
7     A7  2   2
8     A8  4   2
9     A9  5   2
10   A10  1   5

According to the answers in that post, we can get a transition matrix as follows:

res <- with(df, table(xt, xt1)) ## table() to form transition matrix
res/rowSums(res)                ## /rowSums() to normalize by row
#    xt1
# xt          1         2         4         5
#   1 0.5000000 0.0000000 0.0000000 0.5000000
#   2 0.3333333 0.3333333 0.3333333 0.0000000
#   3 0.5000000 0.5000000 0.0000000 0.0000000
#   4 0.0000000 1.0000000 0.0000000 0.0000000
#   5 0.0000000 0.5000000 0.5000000 0.0000000 

We notice that there is no column 3 because there is no state 3 at time t+1. However in math the transition matrix has to be square. For this situation, we still need a column 3 where [3,3]=1 and other elements=0 (the rule is that for any missing column n or missing row n, we set [n,n]=1 and other elements in that row/column =0) which is as follows:

#    xt1
# xt          1         2         3         4         5
#   1 0.5000000 0.0000000 0.0000000 0.0000000 0.5000000
#   2 0.3333333 0.3333333 0.0000000 0.3333333 0.0000000
#   3 0.5000000 0.5000000 1.0000000 0.0000000 0.0000000
#   4 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
#   5 0.0000000 0.5000000 0.0000000 0.5000000 0.0000000 

Can I achieve that without writing a messy for loop? Thank you.

EDIT: Please use this dataset instead:

df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(2,2,3,2,3,5,2,4,5,4), xt1 = c(1,4,2,1,1,4,2,3,2,5))
   cusip xt xt1
1     A1  2   1
2     A2  2   4
3     A3  3   2
4     A4  2   1
5     A5  3   1
6     A6  5   4
7     A7  2   2
8     A8  4   3
9     A9  5   2
10   A10  4   5

now we have the transition matrix as follows:

res <- with(df, table(xt, xt1)) 
res/rowSums(res)                
   xt1
xt     1    2    3    4    5
  2 0.50 0.25 0.00 0.25 0.00
  3 0.50 0.50 0.00 0.00 0.00
  4 0.00 0.00 0.50 0.00 0.50
  5 0.00 0.50 0.00 0.50 0.00

Notice that row 1 is missing. Now I want a new row 1 in which [1,1]=1 and other elements =0 (so that this row sums up to 1). Get something like:

   xt1
xt     1    2    3    4    5
  1 1.00 0.00 0.00 0.00 0.00
  2 0.50 0.25 0.00 0.25 0.00
  3 0.50 0.50 0.00 0.00 0.00
  4 0.00 0.00 0.50 0.00 0.50
  5 0.00 0.50 0.00 0.50 0.00

How can I achieve that (add the missing row)?

Upvotes: 0

Views: 116

Answers (1)

Frank
Frank

Reputation: 66819

Here's a way to do it (only looking at the second question posed):

# setup
df = data.frame(
  cusip = paste("A", 1:10, sep = ""), 
  xt = c(2,2,3,2,3,5,2,4,5,4), 
  xt1 = c(1,4,2,1,1,4,2,3,2,5)
)

df$xt   = factor(df$xt, levels=1:5)
df$xt1  = factor(df$xt1, levels=1:5)

# making the transition frequency table
tab = with(df, prop.table(table(xt,xt1), 1))

#    xt1
# xt     1    2    3    4    5
#   1                         
#   2 0.50 0.25 0.00 0.25 0.00
#   3 0.50 0.50 0.00 0.00 0.00
#   4 0.00 0.00 0.50 0.00 0.50
#   5 0.00 0.50 0.00 0.50 0.00

This is the correct table for describing the frequency of transitions observed in the data df. If, however, you want to impute a transition rule where no data is available, there are some options. The OP wants to impute that any unobserved states are "absorbing states":

r = rowSums(tab,na.rm=TRUE)==0

tab[r, ] <- diag(nrow(tab))[r,,drop=FALSE]

#    xt1
# xt     1    2    3    4    5
#   1 1.00 0.00 0.00 0.00 0.00
#   2 0.50 0.25 0.00 0.25 0.00
#   3 0.50 0.50 0.00 0.00 0.00
#   4 0.00 0.00 0.50 0.00 0.50
#   5 0.00 0.50 0.00 0.50 0.00

I don't think this is a good idea, since it is hiding features of the true data.

Upvotes: 1

Related Questions