Reputation: 352
Following the example here: input format for binomial glm in R, I have a dataset with y = cbind(success, failure))
with each row representing one treatment.
My question is: How do I convert this to a "binary" format for each observation (e.g., y = 0 or 1 for each observation)? Working example here:
df1 <- data.frame(time = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
symb = c('a', 'a', 'a', 'b', 'b', 'b','a', 'a', 'a', 'b', 'b', 'b'),
success= c(324,234,123,234,424,323,124,537,435,645,231,234),
failure= c(84,23,20,74,44,73,12,59,41,68,23,34))
Where success = 1 and failure = 0, and the final dataframe will have 4423 rows (sum(df1$success)+sum(df1$failure)
). This answer gets to where I'm trying to go.
Upvotes: 1
Views: 1103
Reputation: 352
Five years later this code snippet is still proving useful. Based in @bouncyball's original answer, here's an updated code with pivot_longer
replacing gather
and with mutate
/ifelse
replacing the loop through the rows:
library(tidyverse)
df1 <- data.frame(
time = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
symb = c('a', 'a', 'a', 'b', 'b', 'b','a', 'a', 'a', 'b', 'b', 'b'),
success = c(324,234,123,234,424,323,124,537,435,645,231,234),
failure = c(84,23,20,74,44,73,12,59,41,68,23,34)
)
df1_long <- df1 %>%
pivot_longer(
cols = c(success, failure),
names_to = "outcome",
values_to = "count"
)
df1_full <- df1_long %>%
slice(rep(seq_len(n()), count)) %>%
mutate(binary_code = if_else(outcome == "success", 1, 0))
Upvotes: 0
Reputation: 10761
Here's a way, using gather
to reshape the data, and then hints from this answer to do the other heavy lifting.
library(tidyverse)
# convert to long format
df1_long <- df1 %>%
gather(code, count, success, failure)
# function to repeat a data.frame
rep_df <- function(df, n){
do.call('rbind', replicate(n, df, simplify = FALSE))
}
# loop through each row and then rbind together
df1_full <- do.call('rbind',
lapply(1:nrow(df1_long),
FUN = function(i)
rep_df(df1_long[i,], df1_long[i,]$count)))
# create binary_code
df1_full$binary_code <- as.numeric(df1_full$code == 'success')
Here's what the first few rows look like:
# time symb code count binary_code
# 1 1 a success 324 1
# 2 1 a success 324 1
# 3 1 a success 324 1
# 4 1 a success 324 1
# 5 1 a success 324 1
# 6 1 a success 324 1
Upvotes: 2