Reputation: 103
I am trying to generate a matrix of joint probabilities. It's a symmetric matrix. The main diagonal elements are interpreted as probabilities p ( A i ) that a binary variable A i equals 1. The off-diagonal elements are the probabilities p ( A i A j ) that both A i and A j are 1. This matrix should respond to the following conditions :
0 ≤ p A i ≤ 1
max ( 0 , p A i + p A j − 1 ) ≤ p A i A j ≤ min ( p A i , p A j ) , i ≠ j
p A i + p A j + p A k − p A i A j − p A i A k − p A j A k ≤ 1 , i ≠ j , i ≠ k , j ≠ k
These conditions are checked with check.commonprob.
I built a function to generate this matrix respecting these conditions:
# First I need another function to make the matrix symmetric
makeSymm <- function(m) {
m[upper.tri(m)] <- t(m)[upper.tri(m)]
return(m) }
b=matrix(0,10,10)
#The functionthat generates joint probabilities
joint=function(b,x,y,u,z,k,m){
repeat{
diag(b)=runif(k, min=x, max=y)
b[lower.tri(b,diag=FALSE)]<-runif(m,min=u, max=z)
b<-makeSymm(b)
check.commonprob(b)->c
if(c==TRUE)
break}
return(b)}
Since b is 10*10 matrix => there is 10 diagonal elements and 45 elements in the lower triangular matrix. I got this result:
b=joint(b,0.4,0.6,0.2,0.4,10,45)
> b
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.4479626 0.2128775 0.3103472 0.2342798 0.2719423 0.3114339 0.3978305
[2,] 0.2128775 0.4413829 0.2603543 0.2935595 0.2556380 0.2486850 0.2694443
[3,] 0.3103472 0.2603543 0.5170409 0.3003153 0.2651415 0.3410199 0.2321201
[4,] 0.2342798 0.2935595 0.3003153 0.5930984 0.2719581 0.3982266 0.3157343
[5,] 0.2719423 0.2556380 0.2651415 0.2719581 0.4031691 0.2157856 0.3016181
[6,] 0.3114339 0.2486850 0.3410199 0.3982266 0.2157856 0.4042654 0.2595399
[7,] 0.3978305 0.2694443 0.2321201 0.3157343 0.3016181 0.2595399 0.5195244
[8,] 0.3154185 0.3174374 0.2920965 0.3259053 0.2847335 0.3560568 0.2070868
[9,] 0.2892746 0.2510410 0.3232922 0.2970148 0.3070217 0.3445408 0.3180946
[10,] 0.2948818 0.2264481 0.3210267 0.2866854 0.3783635 0.3427585 0.2306935
[,8] [,9] [,10]
[1,] 0.3154185 0.2892746 0.2948818
[2,] 0.3174374 0.2510410 0.2264481
[3,] 0.2920965 0.3232922 0.3210267
[4,] 0.3259053 0.2970148 0.2866854
[5,] 0.2847335 0.3070217 0.3783635
[6,] 0.3560568 0.3445408 0.3427585
[7,] 0.2070868 0.3180946 0.2306935
[8,] 0.5958957 0.2710500 0.2318991
[9,] 0.2710500 0.5003779 0.2512744
[10,] 0.2318991 0.2512744 0.5004233
Up to now , everything seems good, but the problem is that when I wanted to generate a 100*100 matrix, I noticed that beyond a dimension of 20*20 the running time becomes so long (hours) and I can't get a result at the end because i have to stop it. Do you have any suggestions to improve this function so I can try it on 100*100 matrix ? Also can I stipulate the mean and the standard deviation of the joint probabilities matrix in advance? Thanks !
Upvotes: 2
Views: 1097
Reputation: 51998
If you are simply trying to generate examples of such matrices and don't have any other constraints, you can do so by generating observations from a population that would be implicitly described by such a matrix and then tabulate the observed probabilities. You can start by writing a function which does the tabulation:
p.matrix <- function(A){
n <- nrow(A)
k <- ncol(A)
outer(1:n,1:n,Vectorize(function(i,j) sum(A[i,]*A[j,])))/k
}
The above function can take any binary matrix and turn it into a matrix of probabilities that will statisfy check.commonprob
. To get a matrix of a given size you can do something like:
prob.matrix <- function(n,p = 0.5){
k <- max(1000,10*n^2)
pop <- replicate(k,ifelse(runif(n) < p,1,0))
p.matrix(pop)
}
For example:
> M <- prob.matrix(4,c(0.1,0.9,0.3,0.4))
> M
[,1] [,2] [,3] [,4]
[1,] 0.098 0.090 0.019 0.042
[2,] 0.090 0.903 0.278 0.366
[3,] 0.019 0.278 0.306 0.121
[4,] 0.042 0.366 0.121 0.410
> bindata::check.commonprob(M)
[1] TRUE
For n = 100
this takes about 30 seconds on my machine.
In this function the resulting variables are basically uncorrelated. To get correlated variables, replace the simple ifelse()
call by a custom function which e.g. doesn't allow for runs of 3 or more consecutive 1's. If you want finer control on the correlations, you would need to first be clear on just what you would want them to be.
Upvotes: 2