Evgenia
Evgenia

Reputation: 1

Function to recode multiple variables conditional on other variables

I have a dataset with multiple variables. Each question has the actual survey answer and three other characteristics. So there are four variables for each question. I want to specify if Q135_L ==1 , leave Q135_RT as it is, otherwise code it as NA. I can do that with an ifelse statement.

df$Q135_RT <- ifelse(df$Q135_L == 1, df$Q22_RT, NA)

However, I have hundreds of variables and the names are not related. For example, in the picture we can see Q135, SG1_1 and so on. How can I specify for the whole dataset if a variable ends at _L, then for the same variable ending at _RT should remain as it is, otherwise the variable ending at _RT should be coded as NA. I tried this but it only returns NAs

ifelse(grepl("//b_L" ==1, df), "//b_RT" , NA)

Upvotes: 0

Views: 444

Answers (1)

Abdur Rohman
Abdur Rohman

Reputation: 2944

If I understand your problem correctly, you have a data frame of which the columns represent survey question variables. Each column contains two identifiers, namely: a survey question number (134, 135, etc) and a variable letter (L, R, etc). Because you provide no reproducible example, I tried to make a simplified example of your data frame:

set.seed(5)
DF <- data.frame(array(sample(1:4, 24, replace = TRUE), c(4,6)))
colnames(DF) <- c("Q134_L","Q135_L", "Q134_R", "Q135_R", "Q_L1", "Q134_S")
DF
#   Q134_L Q135_L Q134_R Q135_R Q_L1 Q134_S
# 1      2      3      2      3    1      1
# 2      3      1      3      2    4      4
# 3      1      1      3      2    4      3
# 4      3      1      3      3    2      1

What you want is that if Q135_L == 1, leave Q135_RT as it is, otherwise code it as NA. Here is a function that implements this recoding logic:

recode <- function(yourdf, questnums) {
  for (k in 1:length(questnums)) {
    charnum <- as.character(questnums)
    col_end_L_k <- yourdf[grepl("_L\\b", colnames(yourdf)) &
      grepl(charnum[k], colnames(yourdf))]
    col_end_R_k <- yourdf[grepl("_RT\\b", colnames(yourdf)) &
      grepl(charnum[k], colnames(yourdf))]
    row_is_1 <- which(col_end_L_k == 1)
    col_end_R_k[-row_is_1, ] <- NA
    yourdf[, colnames(col_end_R_k)] <- col_end_R_k
  }
  return(yourdf)
}

This function takes a data frame and a vector of question numbers, and then returns the data frame that has been recoded.

What this function does:

  1. Selecting each question number using for.
  2. Using grepl to identify any column that contains the selected number and contains _L at the end of the column name.
  3. Similar with above but for _RT at the end of the column name.
  4. Using which to identify the location of rows in the _L column that contain 1.
  5. Keeping the values of the _RT column, which has the same question number with the corresponding _L column, in those rows, and change values on other rows to NA.

The result:

recode(DF, 134:135)
#   Q134_L Q135_L Q134_RT Q135_RT Q_L1 Q134_S
# 1      2      3      NA      NA    1      1
# 2      3      1      NA       2    4      4
# 3      1      1       3       2    4      3
# 4      3      1      NA       3    2      1

Note that the Q_L1 column is not affected because _L in this column is not located on the end of the column name.

As for how to define questnums, the question numbers, you just need to create a numeric vector. Examples:

  • Your questnums are 1 to 200. Then use 1:200 or seq(200), so recode(DF, 1:200).
  • Your questnums are 1, 3, 134, 135. Then, use recode(DF, c(1, 3, 134, 135)).
  • You can also assign the question numbers to an object first, such as n = c(25, 135, 145) and the use it : recode(DF, n)

Upvotes: 0

Related Questions