Reputation: 1892
I'm trying to simulate ID numbers for population that covers multiple classes with the following parameters
Generate some Example data
data(mtcars)
set.seed(9999)
mtcars$count<-sample(x = 1:100,size = 32,replace=T)
mtcars
Car.Sample<-sample(1:sum(mtcars$count),15)
So, I'm simulating 15 cars out of the population (1774). My initial thought on how to calculate the ID numbers for each member of the sample is to use the number I sampled and roll down the records of mtcars until the sum of the count exceeds the sample number. Then subtract the sum of count from all records prior to that one and the remainder is the ID number of the car inside that class. e.g.
Car.ID<-function(x){
Commute <- 0
Counter <- 0
while (Commute<x){
Counter <- Counter + 1
Commute <- Commute + mtcars[Counter,'count']
}
# we overshot the count so we need to step back one iteration
Commute <- Commute - mtcars[Counter,'count']
Class <- rownames(mtcars)[Counter]
ID.Num <- x - Commute
temp <- paste(Class,ID.Num,sep=':')
return(temp)
}
This function generates correct results in that if I feed in every possible sample number, I get a list of assigned IDs that is consistent with the rules above. The problem is that it is slower than spit. My actual use case has 1000 classes and I may need to simulate sample size on the order of 10^5 or 10^6.
Thanks for the help.
Best answer so far: Optimize using cumsum
function (@patabongo)
mtcars$Commute <- cumsum(mtcars$count)
Car.ID <- function(x) {
row <- head(which(mtcars$Commute >= x), n = 1)
Commutation <- mtcars$Commute[row-1]
if (length(Commutation)==0) {Commutation <- 0}
return(paste(rownames(mtcars)[row], x - Commutation, sep = ":"))
}
Upvotes: 1
Views: 210
Reputation: 928
One way is to assign a cumulative sum column to mtcars so you're not having to recalculate that all the time.
mtcars$cumsum <- cumsum(mtcars$count)
Car.ID <- function(x) {
if (x < mtcars$cumsum[1]) {
return(paste(rownames(mtcars)[1], x, sep = ":"))
} else {
row <- tail(which(mtcars$cumsum < x), n = 1)
return(paste(rownames(mtcars)[row + 1], x - mtcars$cumsum[row], sep = ":"))
}
}
sapply(Car.Sample, Car.ID)
Upvotes: 1