Reputation: 1928
I am almost positive I can use tidyr::complete
to fill in missing records, but I haven't been able to get it to work properly through using expand
or nesting
. I haven't found any examples that have been very helpful. I have used complete
for single columns, but never multiple.
I have a dataset with group
, subgroup
, low_end
, and high_end
. Low and high are always 5 apart. For every possible pairing of group
, subgroup
, low_end
+high_end
combination, I need a minimum of 1 record in the data. Not all possible combinations of low_end
and high_end
will be in the data (in this reprex 20/25).
df <- data.frame(
group = c(rep('A', 2), rep('B', 2), rep('C', 1)),
subgroup = c(1,2,2,2,1),
low_end = c(5, 10, 10, 15, 25),
high_end = c(10, 15, 15, 20, 30),
value = c(rep(99, 5))
)
group subgroup low_end high_end value
1 A 1 5 10 99
2 A 2 10 15 99
3 B 2 10 15 99
4 B 2 15 20 99
5 C 1 25 30 99
Desired Output:
desired_result <- data.frame(
group = c(rep('A', 10), rep('B', 10), rep('C', 10)),
subgroup = c(rep(1, 5), rep(2, 5), rep(1, 5), rep(2, 5), rep(1, 5), rep(2, 5)),
low_end = c(rep(full_seq(x=c(5,25), 5), 6)),
high_end = c(rep(full_seq(x=c(10,30), 5), 6)),
value = c(99, rep(NA, 5), 99, rep(NA, 9), 99, 99, rep(NA, 6), 99, rep(NA, 5))
)
group subgroup low_end high_end value
1 A 1 5 10 99
2 A 1 10 15 NA
3 A 1 15 20 NA
4 A 1 20 25 NA
5 A 1 25 30 NA
6 A 2 5 10 NA
7 A 2 10 15 99
8 A 2 15 20 NA
9 A 2 20 25 NA
10 A 2 25 30 NA
11 B 1 5 10 NA
12 B 1 10 15 NA
13 B 1 15 20 NA
14 B 1 20 25 NA
15 B 1 25 30 NA
16 B 2 5 10 NA
17 B 2 10 15 99
18 B 2 15 20 99
19 B 2 20 25 NA
20 B 2 25 30 NA
21 C 1 5 10 NA
22 C 1 10 15 NA
23 C 1 15 20 NA
24 C 1 20 25 NA
25 C 1 25 30 99
26 C 2 5 10 NA
27 C 2 10 15 NA
28 C 2 15 20 NA
29 C 2 20 25 NA
30 C 2 25 30 NA
The documentation says to use expand
to include values missing from the data, so I figure it has to be something close to...
attempt <- df %>%
complete(expand(group, subgroup, low_end, high_end), fill= NA)
but again, I'm clearly missing something important here.
Upvotes: 3
Views: 948
Reputation: 887971
We could do the seq
inside the nesting
library(dplyr)
library(tidyr)
df %>%
complete(group, subgroup, nesting(low_end = seq(min(low_end),
max(low_end), by = 5),
high_end = seq(min(high_end), max(high_end), by = 5)))
-output
# A tibble: 30 x 5
# group subgroup low_end high_end value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 A 1 5 10 99
# 2 A 1 10 15 NA
# 3 A 1 15 20 NA
# 4 A 1 20 25 NA
# 5 A 1 25 30 NA
# 6 A 2 5 10 NA
# 7 A 2 10 15 99
# 8 A 2 15 20 NA
# 9 A 2 20 25 NA
#10 A 2 25 30 NA
# … with 20 more rows
Upvotes: 2