Reputation: 1
I have a dataset with multiple variables. Each question has the actual survey answer and three other characteristics. So there are four variables for each question. I want to specify if Q135_L ==1
, leave Q135_RT
as it is, otherwise code it as NA
. I can do that with an ifelse
statement.
df$Q135_RT <- ifelse(df$Q135_L == 1, df$Q22_RT, NA)
However, I have hundreds of variables and the names are not related. For example, in the picture we can see Q135
, SG1_1
and so on. How can I specify for the whole dataset if a variable ends at _L
, then for the same variable ending at _RT
should remain as it is, otherwise the variable ending at _RT
should be coded as NA
.
I tried this but it only returns NAs
ifelse(grepl("//b_L" ==1, df), "//b_RT" , NA)
Upvotes: 0
Views: 444
Reputation: 2944
If I understand your problem correctly, you have a data frame of which the columns represent survey question variables. Each column contains two identifiers, namely: a survey question number (134, 135, etc) and a variable letter (L, R, etc). Because you provide no reproducible example, I tried to make a simplified example of your data frame:
set.seed(5)
DF <- data.frame(array(sample(1:4, 24, replace = TRUE), c(4,6)))
colnames(DF) <- c("Q134_L","Q135_L", "Q134_R", "Q135_R", "Q_L1", "Q134_S")
DF
# Q134_L Q135_L Q134_R Q135_R Q_L1 Q134_S
# 1 2 3 2 3 1 1
# 2 3 1 3 2 4 4
# 3 1 1 3 2 4 3
# 4 3 1 3 3 2 1
What you want is that if Q135_L == 1, leave Q135_RT as it is, otherwise code it as NA. Here is a function that implements this recoding logic:
recode <- function(yourdf, questnums) {
for (k in 1:length(questnums)) {
charnum <- as.character(questnums)
col_end_L_k <- yourdf[grepl("_L\\b", colnames(yourdf)) &
grepl(charnum[k], colnames(yourdf))]
col_end_R_k <- yourdf[grepl("_RT\\b", colnames(yourdf)) &
grepl(charnum[k], colnames(yourdf))]
row_is_1 <- which(col_end_L_k == 1)
col_end_R_k[-row_is_1, ] <- NA
yourdf[, colnames(col_end_R_k)] <- col_end_R_k
}
return(yourdf)
}
This function takes a data frame and a vector of question numbers, and then returns the data frame that has been recoded.
What this function does:
for
.grepl
to identify any column that contains the selected number and contains _L
at the end of the column name._RT
at the end of the column name.which
to identify the location of rows in the _L
column that contain 1
._RT
column, which has the same question number with the corresponding _L
column, in those rows, and change values on other rows to NA
.The result:
recode(DF, 134:135)
# Q134_L Q135_L Q134_RT Q135_RT Q_L1 Q134_S
# 1 2 3 NA NA 1 1
# 2 3 1 NA 2 4 4
# 3 1 1 3 2 4 3
# 4 3 1 NA 3 2 1
Note that the Q_L1
column is not affected because _L
in this column is not located on the end of the column name.
As for how to define questnums
, the question numbers, you just need to create a numeric vector. Examples:
1:200
or seq(200)
, so recode(DF, 1:200)
.recode(DF, c(1, 3, 134, 135))
.n = c(25, 135, 145)
and the use it : recode(DF, n)
Upvotes: 0