user2627717
user2627717

Reputation: 344

How to generate a matrices A) each row has a single value of one; B) rows sum to one

This is a two-part problem: the first is to create an NXN square matrix for which only one random element in each row is 1, the other items must be zero. (i.e. the sum of elements in each row is 1).

The second is to create an NXN square matrix for which the sum of items in each row is 1, but each element follows a distribution e.g. normal distribution.

Related questions include (Create a matrix with conditional sum in each row -R) Matlab seems to do what I want automatically (Why this thing happens with random matrix such that all rows sum up to 1?), but I am looking for a solution in r.

Here is what I tried:

# PART 1

N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,sample(N,1)]<- 1
})

(I get zeroes still)

# PART 2
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,]<- rnorm(N)
})

(It needs scaling)

Upvotes: 3

Views: 1896

Answers (3)

IRTFM
IRTFM

Reputation: 263301

Here's another loop-less solution that uses the two column addressing facility using the "[<-" function. This creates a two-column index matrix whose first column is simply an ascending series that assigns the row locations, and whose second column (the one responsible for picking the column positions) is a random integer value. (It's a vectorized version of Matthew's "easiest method", and I suspect would be faster since there is only one call to sample.):

M <- matrix(0,N,N)
M[ cbind(1:N, sample(1:N, N, rep=TRUE))] <- 1

> rowSums(M)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

If you didn't specify rep=TRUE, then colSums(M) would have all been ones as well, but that was not what you requested. It does mean the rank of your resultant matrix may be less than N. If you left out the rep=TRUE the matrix would be full rank.

Upvotes: 2

bartektartanus
bartektartanus

Reputation: 16080

No-loop solution :)

n <- 5
# on which column in each row insert 1s
s <- sample(n,n,TRUE)
# indexes for each row
w <- seq(1,n*n,by=n)-1
index <- s+w
# vector of 0s
vec <- integer(n*n)
# put 1s
vec[index] <- 1
# voila :)
matrix(vec,n,byrow = T)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    0    0    1    0
[3,]    0    0    0    0    1
[4,]    1    0    0    0    0
[5,]    1    0    0    0    0

Upvotes: 0

Matthew Lundberg
Matthew Lundberg

Reputation: 42629

Here you see why lapply doesn't always replace a loop. You're trying to iterate through the rows of x and modify the matrix, but what you're modifying is a copy of the x from the global environment.

The easiest fix is to use a for loop:

for (y in 1:N) {
  x[y,sample(N,1)]<- 1
}

apply series should be used for the return value, rather than programming functions with side-effects.

A way to do this is to return the rows, then rbind them into a matrix. The second example is shown here, as this more closely resembles an apply:

do.call(rbind, lapply((1:N), function(i) rnorm(N)))

However, this is more readable:

matrix(rnorm(N*N), N, N)

Now to scale this to have row sums equal to 1. You use the fact that a matrix is column-oriented and that vectors are recycled, meaning that you can divide a matrix M by rowSums(M). Using a more reasonable N=5:

m <- matrix(rnorm(N*N), N, N)
m/rowSums(m)
##           [,1]       [,2]        [,3]        [,4]        [,5]
## [1,] 0.1788692  0.5398464  0.24980924 -0.01282655  0.04430168
## [2,] 0.4176512  0.2564463  0.11553143  0.35432975 -0.14395871
## [3,] 0.3480568  0.7634421 -0.38433940  0.34175983 -0.06891932
## [4,] 1.1807180 -0.0192272  0.16500179 -0.31201400 -0.01447859
## [5,] 1.1601173 -0.1279919 -0.07447043  0.20865963 -0.16631458 

Upvotes: 1

Related Questions