Adam Hoelscher
Adam Hoelscher

Reputation: 1892

Generating ID numbers by class for a simulated sample in R

I'm trying to simulate ID numbers for population that covers multiple classes with the following parameters

Generate some Example data

data(mtcars)
set.seed(9999)
mtcars$count<-sample(x = 1:100,size = 32,replace=T)
mtcars
Car.Sample<-sample(1:sum(mtcars$count),15)

So, I'm simulating 15 cars out of the population (1774). My initial thought on how to calculate the ID numbers for each member of the sample is to use the number I sampled and roll down the records of mtcars until the sum of the count exceeds the sample number. Then subtract the sum of count from all records prior to that one and the remainder is the ID number of the car inside that class. e.g.

Car.ID<-function(x){
  Commute <- 0
  Counter <- 0
  while (Commute<x){
    Counter <- Counter + 1
    Commute <- Commute + mtcars[Counter,'count']
  }

  # we overshot the count so we need to step back one iteration
  Commute <- Commute - mtcars[Counter,'count']

  Class <- rownames(mtcars)[Counter]
  ID.Num <- x - Commute
  temp <- paste(Class,ID.Num,sep=':')
  return(temp)
}

This function generates correct results in that if I feed in every possible sample number, I get a list of assigned IDs that is consistent with the rules above. The problem is that it is slower than spit. My actual use case has 1000 classes and I may need to simulate sample size on the order of 10^5 or 10^6.

  1. Is there a way to optimize this logic?
  2. Is there a more efficient logic to assign these IDs?

Thanks for the help.

Best answer so far: Optimize using cumsum function (@patabongo)

mtcars$Commute <- cumsum(mtcars$count)
Car.ID <- function(x) {
  row <- head(which(mtcars$Commute >= x), n = 1)
  Commutation <- mtcars$Commute[row-1]
  if (length(Commutation)==0) {Commutation <- 0}
  return(paste(rownames(mtcars)[row], x - Commutation, sep = ":"))
}

Upvotes: 1

Views: 210

Answers (1)

stuwest
stuwest

Reputation: 928

One way is to assign a cumulative sum column to mtcars so you're not having to recalculate that all the time.

mtcars$cumsum <- cumsum(mtcars$count)

Car.ID <- function(x) {
    if (x < mtcars$cumsum[1]) {
        return(paste(rownames(mtcars)[1], x, sep = ":"))
    } else {
        row <- tail(which(mtcars$cumsum < x), n = 1)
        return(paste(rownames(mtcars)[row + 1], x - mtcars$cumsum[row], sep = ":"))
    }
}

sapply(Car.Sample, Car.ID)

Upvotes: 1

Related Questions