Yukun Liang
Yukun Liang

Reputation: 13

Corresponding sums of list data

I am trying to do a simulation with R about a sequence of 0,1’s of length 1000 for 500 times. And trying to figure out the average steps of 3 consecutive ones. This is actually the same as finding the expectation number of times a coin is tossed and comes up heads three times in a row. The rle stands for run length encoding function, which takes a sequence and converts it into running sequence of consecutive elements.

uu<-matrix(sample(c(0,1),500000,replace = TRUE,prob = c(1/2,1/2)),ncol = 1000)
yy<-apply(uu,1,rle)
f1<-function(yy){
  which(yy$lengths>2&yy$values==1)
}
tt<-sapply(yy,f1)
oo<-sapply(tt, function(tt) return(tt[1]))

After I got the first element of tt which means the first sequence of 3 consecutive heads. And I want to cumsum or sum the previous throw I made before the first 3 consecutive heads. And I don't know how to do it 500 times and correspondingly.

f2<-function(yy){sum(yy$length[1:oo[1:500]])+3}
kk<-sapply(yy,f2)
mean(kk)

However, the f2 won't work since R only captures the first element of oo. I am wondering how to sum the corresponding element of yy$length and oo? Please tell me if there is there more convenient way to do this simulation. Thanks a lot.

Upvotes: 1

Views: 78

Answers (2)

Yukun Liang
Yukun Liang

Reputation: 13

First, build up a sequence of 0,1's of length 1000 for 500 times.

A<-matrix(sample(c(0,1),1000000,replace=TRUE),ncol=1000)

Second, using a function to find the sequence of the first occurrence of 3 consecutive ones and sum it together. Sincesum(rle(x)$lengths[1:BB[1]-1])only give us the number before 3 consecutive ones, it need more 3 steps to get the answer. So there is a +3 after sum.

AA<-function(x,y){
  BB<-which(rle(x)$lengths>2&rle(x)$values==y)
  if(length(BB)==0)
    return(0)
  sum(rle(x)$lengths[1:BB[1]-1])+3
}

Apply this function to the original matrix.

CC<-apply(A,1,AA,y=1)

By calculating the mean of CC, we can get the answer of average steps to get 3 consecutive ones through this simulation.

mean(CC)

Upvotes: 0

Uwe
Uwe

Reputation: 42544

If I understand correctly, the OP wants to simulate how many throws it takes on average until exactly 3 heads occur consecutively.

Here is an example how this can be solved using replicate():

nt <- 20L # number of throws in a sequence
nr <- 10L # number of repetitions
set.seed(42) # for reproducible results
mean(
  replicate(
    nr, { 
      throws <- sample(0:1, nt, replace = TRUE)
      print(throws)
      throws_rle <- rle(throws)
      rle_len <- throws_rle[["lengths"]]
      rle_val <- throws_rle[["values"]]
      idx_first_3_heads <- head(which(rle_val == 1L & rle_len == 3L), 1L)
      # add lengths of previous throws
      n_previous_throws <- if (length(idx_first_3_heads) > 0) 
      {
        sum(head(rle_len, idx_first_3_heads - 1L))
      } else {
        NA
      }
      cat("First occurrence of exactly 3 heads after", n_previous_throws, "throws\n")
      n_previous_throws
    }
  ),
  na.rm = TRUE
)
 [1] 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 1 1
First occurrence of exactly 3 heads after NA throws
 [1] 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 0 1 1 1
First occurrence of exactly 3 heads after 17 throws
 [1] 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0
First occurrence of exactly 3 heads after NA throws
 [1] 0 1 1 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0
First occurrence of exactly 3 heads after NA throws
 [1] 1 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 1 1
First occurrence of exactly 3 heads after 13 throws
 [1] 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1
First occurrence of exactly 3 heads after NA throws
 [1] 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 1
First occurrence of exactly 3 heads after 14 throws
 [1] 0 0 1 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 1
First occurrence of exactly 3 heads after 9 throws
 [1] 0 1 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 1 1
First occurrence of exactly 3 heads after 17 throws
 [1] 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1
First occurrence of exactly 3 heads after 0 throws
[1] 11.66667

Note that this code hopefully is self-explaining and is just for demonstration. It will need to be streamlined for production use.

The if clause is required to distinguish between the situation

  • where no 3 consecutive heads occur at all in a sequence (returns NA)
  • and where the 3 consecutive heads occur right at the start (returns 0 as there are no previous throws).

Both situations can be found in the sample use case.

Upvotes: 1

Related Questions