TTS
TTS

Reputation: 1928

use tidyr::complete on multiple columns

I am almost positive I can use tidyr::complete to fill in missing records, but I haven't been able to get it to work properly through using expand or nesting. I haven't found any examples that have been very helpful. I have used complete for single columns, but never multiple.

I have a dataset with group, subgroup, low_end, and high_end. Low and high are always 5 apart. For every possible pairing of group, subgroup, low_end+high_end combination, I need a minimum of 1 record in the data. Not all possible combinations of low_end and high_end will be in the data (in this reprex 20/25).

df <- data.frame(
  group = c(rep('A', 2), rep('B', 2), rep('C', 1)),
  subgroup = c(1,2,2,2,1),
  low_end = c(5, 10, 10, 15, 25),
  high_end = c(10, 15, 15, 20, 30),
  value = c(rep(99, 5))
)
  group subgroup low_end high_end value
1     A        1       5       10    99
2     A        2      10       15    99
3     B        2      10       15    99
4     B        2      15       20    99
5     C        1      25       30    99

Desired Output:

desired_result <- data.frame(
  group = c(rep('A', 10), rep('B', 10), rep('C', 10)),
  subgroup = c(rep(1, 5), rep(2, 5), rep(1, 5), rep(2, 5), rep(1, 5), rep(2, 5)),
  low_end = c(rep(full_seq(x=c(5,25), 5), 6)),
  high_end = c(rep(full_seq(x=c(10,30), 5), 6)),
  value = c(99, rep(NA, 5), 99, rep(NA, 9), 99, 99, rep(NA, 6), 99, rep(NA, 5))
  )
   group subgroup low_end high_end value
1      A        1       5       10    99
2      A        1      10       15    NA
3      A        1      15       20    NA
4      A        1      20       25    NA
5      A        1      25       30    NA
6      A        2       5       10    NA
7      A        2      10       15    99
8      A        2      15       20    NA
9      A        2      20       25    NA
10     A        2      25       30    NA
11     B        1       5       10    NA
12     B        1      10       15    NA
13     B        1      15       20    NA
14     B        1      20       25    NA
15     B        1      25       30    NA
16     B        2       5       10    NA
17     B        2      10       15    99
18     B        2      15       20    99
19     B        2      20       25    NA
20     B        2      25       30    NA
21     C        1       5       10    NA
22     C        1      10       15    NA
23     C        1      15       20    NA
24     C        1      20       25    NA
25     C        1      25       30    99
26     C        2       5       10    NA
27     C        2      10       15    NA
28     C        2      15       20    NA
29     C        2      20       25    NA
30     C        2      25       30    NA

The documentation says to use expand to include values missing from the data, so I figure it has to be something close to...

attempt <- df %>%
  complete(expand(group, subgroup, low_end, high_end), fill= NA)

but again, I'm clearly missing something important here.

Upvotes: 3

Views: 948

Answers (1)

akrun
akrun

Reputation: 887971

We could do the seq inside the nesting

library(dplyr)
library(tidyr)
df %>% 
    complete(group, subgroup, nesting(low_end = seq(min(low_end), 
        max(low_end), by = 5), 
       high_end = seq(min(high_end), max(high_end), by = 5)))

-output

# A tibble: 30 x 5
#   group subgroup low_end high_end value
#   <chr>    <dbl>   <dbl>    <dbl> <dbl>
# 1 A            1       5       10    99
# 2 A            1      10       15    NA
# 3 A            1      15       20    NA
# 4 A            1      20       25    NA
# 5 A            1      25       30    NA
# 6 A            2       5       10    NA
# 7 A            2      10       15    99
# 8 A            2      15       20    NA
# 9 A            2      20       25    NA
#10 A            2      25       30    NA
# … with 20 more rows

Upvotes: 2

Related Questions