Reputation: 31
everyone!
how to generate a vector which satisfy some conditions?
Problem: generate a vector a
such that length(a)=400000
which is made up of 8 elements:0, 5, 10, 50, 500, 5000, 50000, 300000
. Each element appears a set number of times, namely 290205, 100000, 8000, 1600, 160, 32, 2, 1
, respectively. Further, a
is blocked into 4,000 "groups" of 100 consecutive elements; call them a_k, k=1,...,4000
. These groups must satisfy the following:
sum_i a_k_i>150
for all k
.5
, 10
and 50
appear between 25 and 29 times in each group, i.e. for all k
, the set {i|a_i_k in (5,10,50)}
has magnitude between 25 and 29.0
never appears more than 8 times in a row in any group. I have tried this many times, but it does not seem to work: My current code is as follows:
T <- 4*10^(5) # data size
x <- c(0, 5, 10, 50, 500, 5000, 50000, 300000) #seed vector
t <- c(290205, 100000, 8000, 1600, 160, 32, 2, 1) #frequency
A <- matrix(0, 4000, 100) #4000 groups
k <- rep(0, times = 8) #record the number of seeds
for(m in 1:4000) {
p <- (t - k)/(T - 100*(m - 1)) #seed probability
A[, m] <- sample(x, 100, replace = TRUE, prob = p) #group m
sm <- 0
i <- 0
for(j in 1:92) {
if(sum(A[m,j:j + 8])==0){
if(A[m,j] > 0 & A[m,j] < 500) {i <- i+1}
sm <- sm+A[100*m+j]
}
else j <- 0
}
if (sm >= 150 & i > 24 & i < 30 & j != 0) {
m <- m + 1
for (n in seq_len(x)) {
k[n] <- sum(A[, m+1] == x[n]) + k[n]
}
}
}
Upvotes: 3
Views: 618
Reputation: 31
Thanks for everyone! I have figured out my problem.
rm(list = ls())
media <- matrix(rep(rep(c(0,5,NA),c(72,25,3)),4000),nrow=100)
media[98:100,1:2400] <-c(10,10,10)
media[98:99,2401:3200] <-c(50,10)
media[98:99,3201:4000] <-c(50,0)
media[100,2401:4000] <-rep(c(0,500,5000,50000,300000),c(1405,160,32,2,1))
obj1 <- matrix(0,100L,4000)
obj2 <-obj1
grp.cond<-function(x) max(rle(x)$lengths[rle(x)$values==0])<=8
elts<-c(0,5,10,50,500,5000,50000,300000)
for(i in 1:4000){
freq<-c(sapply(elts, function(x) length(which(media[,i]==x))))
while(!grp.cond(obj1[,i])){obj1[,i]<-c(sample(rep(elts,freq)))}
i<-i+1
}
elts1<-c(1:4000)
freq1<-rep(1,times=4000)
a1<-sample(rep(elts1,freq1))
for(i in 1:4000){obj2[,i]<-obj1[,a1[i]]}
a <- c(obj2)
Upvotes: 0
Reputation: 34703
Inspired by @plafort's approach, I've come up with the following that seems to work very quickly and should be capable of generating all vectors satisfying your conditions:
elts<-c(0,5,10,50,500,5000,50000,300000)
freq<-c(290205,100000,8000,1600,160,32,2,1)
ngrp<-4000L
grp.cond1<-function(x)sum(x)>=150
grp.cond2<-function(x)abs(sum(x %in% c(5,10,50))-27)<=2
grp.cond3<-function(x)max(rle(x)$lengths[rle(x)$values==0])<=8
check.all<-function(mat){
all(sapply(1:ncol(mat),function(y)grp.cond1(mat[,y])),
sapply(1:ncol(mat),function(y)grp.cond2(mat[,y])),
sapply(1:ncol(mat),function(y)grp.cond3(mat[,y])))}
while(!check.all(amat)){amat<-matrix(sample(rep(elts,freq)),ncol=ngrp)}
a<-c(amat)
I've also written the code in a way that should be easy to generalize to other element sets/counts, group numbers, and group-wise conditions.
Unfortunately it seems these conditions are pretty stringent, and it may take a long time to generate an acceptable a
. I let the while
loop run ~1300 times with no success...
Upvotes: 2
Reputation: 28441
I can start it off and maybe someone can help get to the next step. My approach is to start with the constraints and let sample
work out the numbers.
set.seed(77)
choose <- c(0,5,10,50,500,5000,50000,300000)
freqs <- c(290205,100000,8000,1600,160,32,2,1)
probs <- freqs/sum(freqs)
check.sum <- function(vec) sum(vec) >= 150
check.interval <- function(vec) abs(sum(vec %in% c(5,10,50))-27)<=2
check.runs <- function(vec, runmax=8) max(rle(vec)$lengths[rle(vec)$values==0]) <= runmax
check.all <- function(vector) {
logicals <- c(check.sum(vector),
check.runs(vector),
check.runs(vector)
)
return(all(logicals))
}
nums <- NULL
res <- list()
for(i in 1:4000) {
nums <- numeric(100)
while(!check.all(nums)) {nums <- sample(choose, 100, replace=T,prob=probs)}
res[i] <- list(nums)
}
str(res)
List of 4000
$ : num [1:100] 1e+01
So this gets you a list of 4,000 groups of 100 numbers that fit the constraints. It only took about two seconds of system time.
Next step is for someone to get a way to build something similar except eliminate 300000 once it is used, and 50000 once it is used twice and so on.
Upvotes: 2
Reputation: 34703
How about just doing it by construction? For example:
amat<-matrix(rep(c(rep(rep(c(0,5),c(8,3)),8),
rep(c(0,NA),c(8,4))),4000),nrow=100)
amat[97:100,1:2205]<-c(rep(10,3),0)
amat[97:98,2206:4000]<-c(5,5)
amat[99:100,2206:2897]<-c(10,10)
amat[99:100,2898]<-c(5,50)
amat[99:100,2899:3307]<-c(5,50)
amat[99:100,3308:3902]<-c(50,50)
amat[which(is.na(amat))]<-rep(c(10,500,5000,5e4,3e5),c(1,160,32,2,1))
a<-c(amat)
This satisfies all your conditions:
Element counts:
>sapply(c(0,5,10,50,500,5000,50000,300000),function(x)length(which(a==x)))
[1] 290205 100000 8000 1600 160 32 2 1
Group sums:
> table(colSums(amat)>=150)
TRUE
4000
5,10,50
frequency:
> table(sapply(1:4000,function(x)abs(sum(amat[,x] %in% c(5,10,50))-27)<=2))
TRUE
4000
Runs of 0
:
> table(sapply(1:4000,function(x)max(rle(amat[,x])$lengths[rle(amat[,x])$values==0])<=8))
#If this is slow, we can just use max(rle(amax[,x]))<=8
# because there aren't many valid groups with strings of 9+
# non-0 elements
TRUE
4000
if in fact we're never allowed to have strings of 9 0
s, we'll need to make a slight adjustment to groups 2:2206, because, e.g. a[100:108]==0
Upvotes: 2