R dplyr nested dummy coding

Question

I need to recode a data set of test responses for use in another application (a program called BLIMP that imputes missing values). Specifically, I need to represent the test items and subscale assignments with dummy codes.

Here I create a data frame that holds the responses to a 10-item test for two persons in a nested format. These data are a simplified version of the actual input table.

library(tidyverse)
df <- tibble(
  person = rep(101:102, each = 10),
  item = as.factor(rep(1:10, 2)),
  response = sample(1:4, 20, replace = T),
  scale = as.factor(rep(rep(1:2, each = 5), 2))
) %>% mutate(
  scale_last = case_when(
    as.integer(scale) != lead(as.integer(scale)) | is.na(lead(as.integer(scale))) ~ 1,
    TRUE ~ NA_real_
  )
)

The columns of df contain:

person: ID numbers for the persons (10 rows for each person)
item: test items 1-10 for each person. Note how the items are nested within each person.
response: score for each item
scale: the test has two subscales. Items 1-5 are assigned to subscale 1, and items 6-10 are assigned to subscale 2.
scale_last: a code of 1 in this column indicates that the item is the last item in its assigned sub scale. This characteristic becomes important below.

I then create dummy codes for the items using the recipes package.

library(recipes)
dum <- df %>% 
  recipe(~ .) %>% 
  step_dummy(item, one_hot = T) %>% 
  prep(training = df) %>%
  bake(new_data = df)
print(dum, width = Inf)

#   person response scale scale_last item_X1 item_X2 item_X3 item_X4 item_X5 item_X6 item_X7
#                                    
# 1    101        2 1             NA       1       0       0       0       0       0       0
# 2    101        3 1             NA       0       1       0       0       0       0       0
# 3    101        3 1             NA       0       0       1       0       0       0       0
# 4    101        1 1             NA       0       0       0       1       0       0       0
# 5    101        1 1              1       0       0       0       0       1       0       0
# 6    101        1 2             NA       0       0       0       0       0       1       0
# 7    101        3 2             NA       0       0       0       0       0       0       1
# 8    101        4 2             NA       0       0       0       0       0       0       0
# 9    101        2 2             NA       0       0       0       0       0       0       0
#10    101        4 2              1       0       0       0       0       0       0       0
#11    102        2 1             NA       1       0       0       0       0       0       0
#12    102        1 1             NA       0       1       0       0       0       0       0
#13    102        2 1             NA       0       0       1       0       0       0       0
#14    102        3 1             NA       0       0       0       1       0       0       0
#15    102        2 1              1       0       0       0       0       1       0       0
#16    102        1 2             NA       0       0       0       0       0       1       0
#17    102        4 2             NA       0       0       0       0       0       0       1
#18    102        2 2             NA       0       0       0       0       0       0       0
#19    102        4 2             NA       0       0       0       0       0       0       0
#20    102        3 2              1       0       0       0       0       0       0       0
#   item_X8 item_X9 item_X10
#            
# 1       0       0        0
# 2       0       0        0
# 3       0       0        0
# 4       0       0        0
# 5       0       0        0
# 6       0       0        0
# 7       0       0        0
# 8       1       0        0
# 9       0       1        0
#10       0       0        1
#11       0       0        0
#12       0       0        0
#13       0       0        0
#14       0       0        0
#15       0       0        0
#16       0       0        0
#17       0       0        0
#18       1       0        0
#19       0       1        0
#20       0       0        1

The output shows the item dummy codes represented in the columns with the item_ prefix. For downstream processing, I need a further level of recoding. Within each subscale, the items must be dummy-coded relative to the last item of the subscale. Here’s where the scale_last variable comes into play; this variable identifies the rows in the output that need to be recoded.

For example, the first of these rows is row 5, the row for the last item (item 5) in subscale 1 for person 101. In this row the value of column item_X5 needs to be recoded from 1 to 0. In the next row to be recoded (row 10), it is the value of item_X10 that needs to be recoded from 1 to 0. And so on.

I’m struggling for the right combination of dplyr verbs to accomplish this. What’s tripping me up is the need to isolate specific cells within specific rows to be recoded.

Thanks in advance for any help!

Ronak Shah · Accepted Answer

We can use mutate_at and replace values from "item" columns to 0 where scale_last == 1

library(dplyr)

dum %>% mutate_at(vars(starts_with("item")), ~replace(., scale_last == 1, 0))

# A tibble: 20 x 14
#   person response scale scale_last item_X1 item_X2 item_X3 item_X4 item_X5
#                              
# 1    101        2 1             NA       1       0       0       0       0
# 2    101        3 1             NA       0       1       0       0       0
# 3    101        1 1             NA       0       0       1       0       0
# 4    101        1 1             NA       0       0       0       1       0
# 5    101        3 1              1       0       0       0       0       0
# 6    101        4 2             NA       0       0       0       0       0
# 7    101        4 2             NA       0       0       0       0       0
# 8    101        3 2             NA       0       0       0       0       0
# 9    101        2 2             NA       0       0       0       0       0
#10    101        4 2              1       0       0       0       0       0
#11    102        2 1             NA       1       0       0       0       0
#12    102        1 1             NA       0       1       0       0       0
#13    102        4 1             NA       0       0       1       0       0
#14    102        4 1             NA       0       0       0       1       0
#15    102        4 1              1       0       0       0       0       0
#16    102        3 2             NA       0       0       0       0       0
#17    102        4 2             NA       0       0       0       0       0
#18    102        1 2             NA       0       0       0       0       0
#19    102        4 2             NA       0       0       0       0       0
#20    102        4 2              1       0       0       0       0       0
# … with 5 more variables: item_X6 , item_X7 , item_X8 ,
#   item_X9 , item_X10

In base R, we can use lapply

cols <- grep("^item", names(dum))
dum[cols] <- lapply(dum[cols], function(x) replace(x, dum$scale_last == 1, 0))

R dplyr nested dummy coding

Answers (1)

Related Questions