lukehawk
lukehawk

Reputation: 1493

Very large, very sparse Markov Transition Matrices

I have 10 variables, with 10 individual states (deciles), and I am trying to create a 2D markov transition matrix. This would imply a matrix of 10^10 rows and 10^10 columns, which would be very sparse. This is far too large to work with, but I am hoping there is a way to deal with sparse matrices, and specifically sparse transition matrices.

If you are unfamiliar, the transition matrix provides the probabilities of moving from one 'state' to another in the next time step. Since I have so many variables, I have to get the projection (if you will) from the 10D space to the 2D. For example, if I had 2 variables, each either positive or negative, I would have 2^2 = 4 states, ++,--,+-,-+, and these would be the rows and columns. The values would be some probabilities of moving from the row state to the column state. It may be impossible to go from one state to another, yielding a zero in that position for the matrix. This produces something like the following:

           [+,+]      [+,-]      [-,-]      [-,+]
[+,+]  0.5500000  0.3500000  0.1000000  0.0000000
[+,-]  0.0000000  0.5500000  0.4500000  0.0000000
[-,-]  0.0000000  0.2500000  0.0000000  0.7500000
[-,+]  1.0000000  0.0000000  0.0000000  0.0000000

As you can infer, the resulting matrix may (and probably is) very sparse, creating a massive matrix, with only certain regions actually yielding useful information.

I am currently using the 'markovchain' package to generate the empirical transition matrix from data. (It is a much simplified, smaller version for testing, using only 3 variables each with 3 individual states.) It will throw an error if I attempt to create a matrix too large, which will have many 'empty' regions. Is there a better package which incorporates support for very large, very sparse matrices?

This is the function I am using, which would produce the large sparse matrix, if it could:

theP <- markovchainFit(data = gdxReturnsUD)$estimate@transitionMatrix

Thanks!!!

Upvotes: 0

Views: 1729

Answers (2)

doron
doron

Reputation: 464

you are going to have a hard time working with such large matrices no mater which method you use. this is not an action you could do using a simple laptop.. you are going to need a beast of a machine with extremely large RAM. or set up a computer cluster to do the heavy lifting (by partitioning the task this should be possible)

once you get your hands on such a machine, you will need to create the matrix manually. first create an empty matrix with the wanted dimentions, and gradualy fill it up using a set of table functions.

something like this:

##create empty matrix
dummymarkov<-matrix(nrow = 10^10, ncol = 10^10) 
colnames(dummymarkov)<-unique(train$states)
rownames(dummymarkov)<-unique(train_ver2$code)
##subset your data
temp<-train[x:y] 
## table the subset
tmp_table<-table(tmp$state[1:(nrow(tmp)-1)], tmp$state[2:nrow(tmp)])
## feed the larger matrix
dummymarkov[rownames(tmp_table)[1],colnames(tmp_table)[1]]<-tmp_table[1,1]
dummymarkov[rownames(tmp_table)[2],colnames(tmp_table)[1]]<-tmp_table[2,1]
dummymarkov[rownames(tmp_table)[1],colnames(tmp_table)[2]]<-tmp_table[1,2]
dummymarkov[rownames(tmp_table)[2],colnames(tmp_table)[2]]<-tmp_table[2,2]
##repeat process until finished, then use dummymarkov to calculate the transition matrix..

Upvotes: 0

Keyu Wu
Keyu Wu

Reputation: 1

I am not fully understand your question yet. However, according to I guessed, I though you might use Dynamic Bayesian Network to present your model. From data structure point of view, you need a more compressed data structure, instead of 2-D matrix. You can Google "probabilistic graphical model" to get some more ideas.

Upvotes: 0

Related Questions