Completing a data frame within groups using combinations from out of group

Question

I need to add missing combinations of variables within groups, but using combinations from outside of that group. Take the following example. Here is a data frame where raters look at samples, and then answer three questions about the sample. Rater A looked a samples 1, 2, and 3, whereas Rater B looked at only samples 1 and 2. Notice also, that Rater B only completed item 7060 and 7930 for sample 1, but did not complete item 7842. I want to turn that implicit missing item scoring into an explicit missing. The goal is to add rows for items that are missing within a sample, but not add samples that are missing within raters (i.e., don't add sample 3 to Rater B, but do add missing items to samples that were rated by Rater B).

library(tidyverse)

df <- data_frame(
  rater_id = c(rep("A", 9), rep("B", 5)),
  sample_id = c(rep(1, 3), rep(2, 3), rep(3, 3), rep(1, 2), rep(2, 3)),
  item_id = c(7060, 7842, 7930, 9571, 4678, 5966, 1758, 3148, 2574,
              7060, 7930, 9571, 4678, 5966),
  score = sample(c(0, 1), size = 14, replace = TRUE)
)
df
#> # A tibble: 14 x 4
#>    rater_id sample_id item_id score
#>                
#>  1 A                1    7060     0
#>  2 A                1    7842     0
#>  3 A                1    7930     1
#>  4 A                2    9571     0
#>  5 A                2    4678     0
#>  6 A                2    5966     1
#>  7 A                3    1758     0
#>  8 A                3    3148     1
#>  9 A                3    2574     0
#> 10 B                1    7060     0
#> 11 B                1    7930     0
#> 12 B                2    9571     0
#> 13 B                2    4678     1
#> 14 B                2    5966     1

I have tried using tidy::complete with little success. Nesting by sample_id and item_id doesn't add the missing row, because all combinations of sample_id and item_id are already present in the data.

df %>% complete(nesting(sample_id, item_id))
#> # A tibble: 14 x 4
#>    sample_id item_id rater_id score
#>                
#>  1         1    7060 A            0
#>  2         1    7060 B            0
#>  3         1    7842 A            0
#>  4         1    7930 A            1
#>  5         1    7930 B            0
#>  6         2    4678 A            0
#>  7         2    4678 B            1
#>  8         2    5966 A            1
#>  9         2    5966 B            1
#> 10         2    9571 A            0
#> 11         2    9571 B            0
#> 12         3    1758 A            0
#> 13         3    2574 A            0
#> 14         3    3148 A            1

Adding rater_id outside of the nesting statement also fails to give the desired output. It successfully adds the missing row for sample 1, but also adds additional missing rows for sample 3. However, because Rater B was never given sample 3, we wouldn't count this as "missing" (even though technically it kind of is).

df %>% complete(rater_id, nesting(sample_id, item_id))
#> # A tibble: 18 x 4
#>    rater_id sample_id item_id score
#>                
#>  1 A                1    7060     0
#>  2 A                1    7842     0
#>  3 A                1    7930     1
#>  4 A                2    4678     0
#>  5 A                2    5966     1
#>  6 A                2    9571     0
#>  7 A                3    1758     0
#>  8 A                3    2574     0
#>  9 A                3    3148     1
#> 10 B                1    7060     0
#> 11 B                1    7842    NA
#> 12 B                1    7930     0
#> 13 B                2    4678     1
#> 14 B                2    5966     1
#> 15 B                2    9571     0
#> 16 B                3    1758    NA
#> 17 B                3    2574    NA
#> 18 B                3    3148    NA

My ideal output would look like this:

#> # A tibble: 18 x 4
#>    rater_id sample_id item_id score
#>                
#>  1 A                1    7060     0
#>  2 A                1    7842     0
#>  3 A                1    7930     1
#>  4 A                2    4678     0
#>  5 A                2    5966     1
#>  6 A                2    9571     0
#>  7 A                3    1758     0
#>  8 A                3    2574     0
#>  9 A                3    3148     1
#> 10 B                1    7060     0
#> 11 B                1    7842    NA
#> 12 B                1    7930     0
#> 13 B                2    4678     1
#> 14 B                2    5966     1
#> 15 B                2    9571     0

Completing a data frame within groups using combinations from out of group

Answers (1)

Related Questions