Reputation: 11696
I have a table in R that I need to build an empirical transition matrix (of counts) in R.
The data looks like this:
ExplicitRoll ExplicitRoll_EOM
No Change No Change
No Change 1-> 3
No Change No Change
NoChangeMonthOfPayoff NoChangeMonthOfPayoff
No Change Entry
NoChangeMonthOfPayoff NoChangeMonthOfPayoff
No Change No Change
....
This table is very large and has many types of other colum entries (for instance 1->3
, charged off
etc). The first column represents month t and the second column represents t-1.
Is there a way to very efficiently cound how many transitions from one state to the next? For reference, the entire data set is 18M rows.
Thank you!
Upvotes: 1
Views: 1299
Reputation: 25415
table()
creates such an incidence matrix. You can convert it to dataframe with as.data.frame.matrix
. Example:
df = data.frame(Col1 = c("A","B","C","D","A","B","E"),Col2 = c("B","C","D","B","B","E","A"))
as.data.frame.matrix(table(df))
A B C D E
A 0 2 0 0 0
B 0 0 1 0 1
C 0 0 0 1 0
D 0 1 0 0 0
E 1 0 0 0 0
18 million rows:
df = data.frame(Col1 = sample(letters,18000000,replace = T),sample(letters,18000000,replace = T))
a = Sys.time()
as.data.frame.matrix(table(df))
Sys.time()-a
Time difference of 0.5171118 secs. Hope this helps!
Upvotes: 2