Reputation: 442
Given:
df <- data.frame(
CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","PelletCoffeeCo","PelletCoffeeCo"),
Email= c("[email protected]", "[email protected]","[email protected]","[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","1","2"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6,2,3))
I need to figure out how to get to:
df2 <- data.frame(CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","Liquders","Liquders","Liquders",
"Liquders","Liquders","Liquders","Liquders", "PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo"),
Email= c("[email protected]", "[email protected]","[email protected]","[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8","9","10",
"1","2","3","4","5","6","7","8","9","10"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6, NA,NA,NA,NA,NA,NA,NA, 2,3,NA,NA,NA,NA,NA,NA,NA,NA))
Explanation: I have data where I surveyed people once a day over a course of 10 days. In a perfect world, I would have 10 responses from each participant, denoted by day1:day10. However, due to non-response, some participants gave 3 responses, others, 6, and others 10 and etc. I'm setting the data up to run a growth model, and so I need the column "Day" to always read Day1 - Day 10, regardless if there's data for those responses. I've tried to demonstrate this by adding NA's to rows that don't have all 10 days of data.
Upvotes: 0
Views: 60
Reputation: 4224
Try this:
library(tidyr)
df %>%
complete(nesting(CompanyID,Email), Day = seq(min(Day), max(Day), 1L)) %>%
data.frame()
Output:
CompanyID Email Day var1
1 Drinkers [email protected] 1 4
2 Drinkers [email protected] 2 5
3 Drinkers [email protected] 3 5
4 Drinkers [email protected] 4 5
5 Drinkers [email protected] 5 5
6 Drinkers [email protected] 6 2
7 Drinkers [email protected] 7 3
8 Drinkers [email protected] 8 2
9 Drinkers [email protected] 9 7
10 Drinkers [email protected] 10 6
11 Liquders [email protected] 1 7
12 Liquders [email protected] 2 NA
13 Liquders [email protected] 3 6
14 Liquders [email protected] 4 6
15 Liquders [email protected] 5 NA
16 Liquders [email protected] 6 NA
17 Liquders [email protected] 7 NA
18 Liquders [email protected] 8 NA
19 Liquders [email protected] 9 NA
20 Liquders [email protected] 10 NA
21 PelletCoffeeCo [email protected] 1 2
22 PelletCoffeeCo [email protected] 2 NA
23 PelletCoffeeCo [email protected] 3 3
24 PelletCoffeeCo [email protected] 4 NA
25 PelletCoffeeCo [email protected] 5 NA
26 PelletCoffeeCo [email protected] 6 NA
27 PelletCoffeeCo [email protected] 7 NA
28 PelletCoffeeCo [email protected] 8 NA
29 PelletCoffeeCo [email protected] 9 NA
30 PelletCoffeeCo [email protected] 10 NA
Edit:
The code above fills each group's Day column values with a complete set of Day values defined by the minimum and maximum of the existing values in that column (i.e. 1 and 10, respectively). The groups by which these Day values are filled can be redefined as needed, but I chose to define them here as Company + Email with the line "nesting(CompanyID,Email)". The data.frame() line is just there to convert the output to a data.frame instead of a tibble. If a data.frame output isn't necessary, feel free to replace or remove that line.
Upvotes: 2
Reputation: 1764
First, create a data frame of the unique company ID. Next, create a data frame of the desired days.
Cross Join these together.
Then join to your original dataset to fill out the table.
comp <- data.frame(CompanyID = unique(df$CompanyID))
Day <- data.frame(Day = c("1","2","3","4","5","6","7","8","9","10"))
compDay <- merge(comp, Day, all = TRUE)
dfday <- merge(df, compDay, by = c("CompanyID", "Day"), all = TRUE)
Upvotes: 0