Thomas Moore
Thomas Moore

Reputation: 352

Convert cbind() format for binomial glm in R to a dataframe with individual rows

Following the example here: input format for binomial glm in R, I have a dataset with y = cbind(success, failure)) with each row representing one treatment.

My question is: How do I convert this to a "binary" format for each observation (e.g., y = 0 or 1 for each observation)? Working example here:

df1 <- data.frame(time = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
                  symb = c('a', 'a', 'a', 'b', 'b', 'b','a', 'a', 'a', 'b', 'b', 'b'),
                  success= c(324,234,123,234,424,323,124,537,435,645,231,234),
                  failure= c(84,23,20,74,44,73,12,59,41,68,23,34))

Where success = 1 and failure = 0, and the final dataframe will have 4423 rows (sum(df1$success)+sum(df1$failure)). This answer gets to where I'm trying to go.

Upvotes: 1

Views: 1103

Answers (2)

Thomas Moore
Thomas Moore

Reputation: 352

Five years later this code snippet is still proving useful. Based in @bouncyball's original answer, here's an updated code with pivot_longer replacing gather and with mutate/ifelse replacing the loop through the rows:

library(tidyverse)

df1 <- data.frame(
  time = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
  symb = c('a', 'a', 'a', 'b', 'b', 'b','a', 'a', 'a', 'b', 'b', 'b'),
  success = c(324,234,123,234,424,323,124,537,435,645,231,234),
  failure = c(84,23,20,74,44,73,12,59,41,68,23,34)
)

df1_long <- df1 %>%
  pivot_longer(
    cols = c(success, failure),
    names_to = "outcome",
    values_to = "count"
  )

df1_full <- df1_long %>%
  slice(rep(seq_len(n()), count)) %>%
  mutate(binary_code = if_else(outcome == "success", 1, 0))

Upvotes: 0

bouncyball
bouncyball

Reputation: 10761

Here's a way, using gather to reshape the data, and then hints from this answer to do the other heavy lifting.

library(tidyverse)
# convert to long format
df1_long <- df1 %>%
  gather(code, count, success, failure)
# function to repeat a data.frame
rep_df <- function(df, n){
  do.call('rbind', replicate(n, df, simplify = FALSE))
}
# loop through each row and then rbind together
df1_full <- do.call('rbind', 
                    lapply(1:nrow(df1_long), 
                           FUN = function(i) 
                             rep_df(df1_long[i,], df1_long[i,]$count)))
# create binary_code
df1_full$binary_code <- as.numeric(df1_full$code == 'success')

Here's what the first few rows look like:

#   time symb    code count binary_code
# 1    1    a success   324           1
# 2    1    a success   324           1
# 3    1    a success   324           1
# 4    1    a success   324           1
# 5    1    a success   324           1
# 6    1    a success   324           1

Upvotes: 2

Related Questions