Reputation: 27
I have a dataset that includes a list of ID numbers and the values associated with that ID. But this dataset is missing a row of data associated with "id4". I confirm this by checking against a list of id numbers which shows me there is a mismatch in row 4 (although in principle, it could have been any or many of the ids missing)
id <- c("id1", "id2", "id3", "id5","id6");
time <- c(1, 2.5, 1, 4.5, 2);
total <- c(5,5,5,5,5);
data <- data.frame(id, time, total);data;
#> id time total
#> 1 id1 1.0 5
#> 2 id2 2.5 5
#> 3 id3 1.0 5
#> 4 id5 4.5 5
#> 5 id6 2.0 5
id_list <- c("id1", "id2", "id3", "id4", "id5","id6");
which(id_list %in% data$id)
#> [1] 1 2 3 5 6
Created on 2021-09-29 by the reprex package (v2.0.1)
I want to add a row where missing id belongs and fill it with the correct id number (so it matches the list in "id_list") and then fill "time" and "total" with 0. The final dataset would look like this:
#> id time total
#> 1 id1 1.0 5
#> 2 id2 2.5 5
#> 3 id3 1.0 5
#> 4 id4 0 0
#> 5 id5 4.5 5
#> 6 id6 2.0 5
But I'm not sure where to go after using %in% to identify which rows are missing.
Upvotes: 1
Views: 203
Reputation: 389175
A base R option using merge
-
result <- merge(data.frame(id = id_list), data, all.x = TRUE)
result[is.na(result)] <- 0
result
# id time total
#1 id1 1.0 5
#2 id2 2.5 5
#3 id3 1.0 5
#4 id4 0.0 0
#5 id5 4.5 5
#6 id6 2.0 5
Upvotes: 0
Reputation: 887691
We may use complete
library(dplyr)
library(tidyr)
data %>%
complete(id = id_list, fill = list(time = 0, total = 0))
-output
# A tibble: 6 × 3
id time total
<chr> <dbl> <dbl>
1 id1 1 5
2 id2 2.5 5
3 id3 1 5
4 id4 0 0
5 id5 4.5 5
6 id6 2 5
Upvotes: 1