user1357015
user1357015

Reputation: 11696

Efficiently creating a transition matrix for a large data frame in R

I have a table in R that I need to build an empirical transition matrix (of counts) in R.

The data looks like this:

ExplicitRoll               ExplicitRoll_EOM
No Change                      No Change
No Change                      1-> 3
No Change                      No Change
NoChangeMonthOfPayoff      NoChangeMonthOfPayoff
No Change                      Entry
NoChangeMonthOfPayoff      NoChangeMonthOfPayoff
No Change                     No Change
....

This table is very large and has many types of other colum entries (for instance 1->3, charged off etc). The first column represents month t and the second column represents t-1.

Is there a way to very efficiently cound how many transitions from one state to the next? For reference, the entire data set is 18M rows.

Thank you!

Upvotes: 1

Views: 1299

Answers (1)

Florian
Florian

Reputation: 25415

table() creates such an incidence matrix. You can convert it to dataframe with as.data.frame.matrix. Example:

df = data.frame(Col1 = c("A","B","C","D","A","B","E"),Col2 = c("B","C","D","B","B","E","A"))
as.data.frame.matrix(table(df))

  A B C D E
A 0 2 0 0 0
B 0 0 1 0 1
C 0 0 0 1 0
D 0 1 0 0 0
E 1 0 0 0 0

18 million rows:

df = data.frame(Col1 = sample(letters,18000000,replace = T),sample(letters,18000000,replace = T))

a = Sys.time()
as.data.frame.matrix(table(df))
Sys.time()-a

Time difference of 0.5171118 secs. Hope this helps!

Upvotes: 2

Related Questions