Reputation: 9809
I am trying to create a matrix with random numbers where the rowSums
should exactly be 1.
I already have a condition which checks if the rowSums
is not 1 and tries to correct it.
When I print out the result it looks correct but if I test if all values are 1 it gives me some FALSE values.
How can I correct that?
library(Rcpp)
cppFunction('
NumericMatrix imembrandc(int n, int k) {
NumericMatrix u( n , k );
IntegerVector sequ = seq(1,100);
NumericVector sampled;
for (int i=0; i < k; ++i) {
sampled = sample(sequ, n);
u(_,i) = sampled / sum(sampled);
}
if (is_true(any(rowSums(u) != 1))) {
u(_,1) = u(_,1) + (1 - rowSums(u));
}
return(u);
}')
When I print out the rowSums
of the result it looks correct:
res = imembrandc(n = 10, k = 5)
rowSums(res)
[1] 1 1 1 1 1 1 1 1 1 1
But checking it gives some FALSEs:
rowSums(res) == 1
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE
Upvotes: 3
Views: 106
Reputation: 26823
The canonical way to generate n
random numbers that sum to 1 is to generate n - 1
values from [0,1)
, add 0 and 1 to the list and take the difference of the sorted list. Of course, this depends on the distribution you want for the random numbers. This can be expressed in R as
set.seed(42)
v <- diff(sort(c(0, runif(5), 1)))
v
#> [1] 0.28613953 0.35560598 0.18870211 0.08435842 0.02226937 0.06292459
sum(v)
#> [1] 1
Created on 2019-05-24 by the reprex package (v0.2.1)
In your case in C++:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix imembrandc(int n, int k) {
NumericMatrix u(n, k);
for (int i = 0; i < n; ++i) {
NumericVector row = runif(k - 1);
row.push_back(0.0);
row.push_back(1.0);
u(i, _) = diff(row.sort());
}
return u;
}
/*** R
set.seed(42)
res = imembrandc(n = 10, k = 5)
rowSums(res)
rowSums(res) == 1
all.equal(rowSums(res),rep(1, nrow(res)))
*/
Note that I am generating rows to begin with, while you were generating columns and then tried to correct the rowSum
. Output:
> set.seed(42)
> res = imembrandc(n = 10, k = 5)
> rowSums(res)
[1] 1 1 1 1 1 1 1 1 1 1
> rowSums(res) == 1
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> all.equal(rowSums(res),rep(1, nrow(res)))
[1] TRUE
BTW, all.equal
gives TRUE
also for your matrix, since the difference is really small. But I find it better to avoid the problem from the beginning.
Upvotes: 4