R: Creating a Function with a "Dynamic" Structure

Question

I am working with the R programming language.

Suppose there is a classroom of students - each student flips the same coin many times (the students don't flip the coin the same number of times). Here is a simulate dataset to represent this example:

library(tidyverse)
library(dplyr)

set.seed(123)
ids = 1:100
student_id = sample(ids, 1000, replace = TRUE)
coin_result = sample(c("H", "T"), 1000, replace = TRUE)
my_data = data.frame(student_id, coin_result)

my_data =  my_data[order(my_data$student_id),]

I want to count the number of "3 Flip Sequences" recorded by each student (e.g. Student 1 got HHHTH : HHH 1 time, HHT 1 time, HTH 1 time)
And the probability of the 3rd Flip based on the previous 2 flips (e.g. in general, over all students, the probability of a H following HH was 0.54)

Here is some R code that performs these tasks:

results = my_data %>%
  group_by(student_id) %>%
  summarize(Sequence = str_c(coin_result, lead(coin_result), lead(coin_result, 2)), .groups = 'drop') %>%
  filter(!is.na(Sequence)) %>%
  count(Sequence)


final = results %>%
    mutate(two_seq = substr(Sequence, 1, 2)) %>%
    group_by(two_seq) %>%
    mutate(third = substr(Sequence, 3, 3)) %>%
    group_by(two_seq, third) %>%
    summarize(sums = sum(n)) %>%
    mutate(prob = sums / sum(sums))

My Question: Suppose I want to now extend this problem to "4 Flip Sequences" (e.g. probability of H given HHH) - I can manually extend this code:

results = my_data %>%
  group_by(student_id) %>%
  summarize(Sequence = str_c(coin_result, lead(coin_result), lead(coin_result, 2), lead(coin_result, 3)), .groups = 'drop') %>%
  filter(!is.na(Sequence)) %>%
  count(Sequence)

final = results %>%
    mutate(three_seq = substr(Sequence, 1, 3)) %>%
    group_by(three_seq) %>%
    mutate(fourth = substr(Sequence, 4, 4)) %>%
    group_by(three_seq, fourth) %>%
    summarize(sums = sum(n)) %>%
    mutate(prob = sums / sum(sums))

Is it possible to convert the above code into a function such that I can repeat this for arbitrary combinations? For example:

results <- function(i) {return(my_data %>%
  group_by(student_id) %>%
  summarize(Sequence = str_c(coin_result, lead(coin_result), lead(coin_result, i+1), lead(coin_result, i+2) .....### insert code here ####), .groups = 'drop') %>%
  filter(!is.na(Sequence)) %>%
  count(Sequence))}

final <- function(i) 
return(results %>%
    mutate(three_seq = substr(Sequence, 1, i)) %>%
    group_by(three_seq) %>%
    mutate(fourth = substr(Sequence, i+1, i+1)) %>%
    group_by(three_seq, fourth) %>%
    summarize(sums = sum(n)) %>%
    mutate(prob = sums / sum(sums)))
}

I am not sure how exactly I would do this, seeing as the first function would require to be "dynamically changed" depending on the value of "i".

Can someone please show me how to do this?

Thanks!

Adam B. · Accepted Answer

Here's a way you can do it in base R:

# Returns a vector of 0's and 1's, bit more efficient than sample
tosses <- floor(runif(1e3, 0, 2)) 

count_seqs <- function(x, seq_length) {
  vec_length <- length(x)

  rolling_window_indices <- rep(1:seq_length, vec_length - seq_length + 1) +
    rep(0:(vec_length - seq_length), each = seq_length)

  mat <- matrix(x[rolling_window_indices], nrow = seq_length)
  sequences <- apply(mat, 2, paste0, collapse = "")
  table(sequences)
}

count_seqs(tosses, 3)

Notice I didn't include any ids in the code above. The reason is that, if all students have the same probability of tossing heads or tails, we can treat them as independent (or, more precisely, treat the design as ignorable). However, it's easy to expand the code for situations where the tosses are not independent, e.g. where each participant has a different probability of tossing heads:

ids <- floor(runif(1e3, 1, 101))
probs <- runif(1e2, 0, 1)

tosses_by_id <- lapply(ids, function(i) rbinom(10, 1, probs[i]))
lapply(tosses_by_id, function(x) count_seqs(x, 3))

R: Creating a Function with a "Dynamic" Structure

Answers (1)

Related Questions

R: Creating a Function with a &quot;Dynamic&quot; Structure

Answers (1)

Related Questions

R: Creating a Function with a "Dynamic" Structure