Reputation: 2663

Recoding sequentially-named variables based on values of answers

I'm struggling with using lapply to recode values parsimoniously.

Let's say I have 10 survey questions with 4 answers each, in which there is always one right or wrong answer. The questions are labeled q_1 through q_10, and my dataframe is called df. I'd like to create new variables with the same sequential labels that simply code the question as "right" (1) or "wrong" (0).

If I were to make a list of the right answers, it would be:

right_answers<-c(1,2,3,4,2,3,4,1,2,4)

Then, I'm trying to write a function that simply recodes all of the variables into new variables while using the same sequential identifier, such as

lapply(1:10, function(fx) {
  df$know_[fx]<-ifelse(df$q_[fx]==right_answers[fx],1,0)
})

In a hypothetical universe where this code was remotely correct, I'd get results such that:

id   q_1    know_1   q_2   know_2
1    1      1        2     1
2    4      0        3     0
3    3      0        2     1
4    4      0        1     0

Thanks so much for your help!

Upvotes: 0

Answers (4)

eamcvey

Reputation: 693

I'd like to suggest a different approach to your question, using the reshape2 package. In my opinion, this has the advantages of being: 1) more idiomatic R (for what that's worth), 2) more readable code, 3) less error prone, particularly if you want to add analysis in the future. In this approach, everything is done within dataframes, which I think is desirable when possible -- easier to keep all the values for a single record (id in this case) and easier to use the power of R tools.

# Creating a dataframe with the form you describe
df <- data.frame(id=c('1','2','3','4'), q_1 = c(1,4,3,4), q_2 = c(2,3,2,1), q_3 = rep(1,     4), q_4 = rep(2, 4), q_5 = rep(3, 4), 
             q_6 = rep(4,4), q_7 = c(1,4,3,4), q_8 = c(2,3,2,1), q_9 = rep(1, 4), q_10 =     rep(2, 4))

right_answers<-c(1,2,3,4,2,3,4,1,2,4)

# Associating the right answers explicitly with the corresponding question labels in a data frame
answer_df <- data.frame(questions=paste('q', 1:10, sep='_'), right_answers)

library(reshape2)

# "Melting" the dataframe from "wide" to "long" form -- now questions labels are in variable values rather than in column names
melt_df <- melt(df) # melt function is from reshape2 package

# Now merging the correct answers into the data frame containing the observed answers
merge_df <- merge(melt_df, answer_df, by.x='variable', by.y='questions')

# At this point comparing the observed to correct answers is trivial (using as.numeric to     convert from logical to 0/1 as you request, though keeping as TRUE/FALSE may be clearer)
merge_df$correct <- as.numeric(merge_df$value==merge_df$right_answers)

# If desireable (not sure it is), put back into "wide" dataframe form
cast_obs_df <- dcast(merge_df, id ~ variable, value.var='value') # dcast function is from reshape2 package
cast_cor_df <- dcast(merge_df, id ~ variable, value.var='correct')
names(cast_cor_df) <- gsub('q_', 'know_', names(cast_cor_df))
final_df <- merge(cast_obs_df, cast_cor_df)

The new tidyr package would probably be even better here than reshape2.

Upvotes: 0

flodel

Reputation: 89097

For the same matrix output as the other answers, I would suggest:

q_names <- paste0("q_", seq_along(right_answers))
answers <- df[q_names]
correct <- mapply(`==`, answers, right_answers)

Upvotes: 1

aosmith

Reputation: 36084

You are likely having trouble with this part of the codedf$q_[fx]. You could call the column names using paste. Such as:

df = read.table(text = "
id   q_1   q_2
1    1              2     
2    4              3     
3    3              2     
4    4              1", header = TRUE)  

right_answers = c(1,2,3,4,2,3,4,1,2,4)

dat2 = sapply(1:2, function(fx) {
            ifelse(df[paste("q",fx,sep = "_")]==right_answers[fx],
                      1,0)
})

This doesn't add columns to your data.frame, but instead makes a new matrix much like @SenorO's answer. You can name the columns in the matrix and then add them to the original data.frame as follows.

colnames(dat2) = paste("know", 1:2, sep = "_")

data.frame(df, dat2)

Upvotes: 0

Señor O

Reputation: 17432

This should give you a matrix of whether or not each answer was correct:

t(apply(test[,grep("q_", names(test))], 1, function(X) X==right_answers))

Upvotes: 0

Recoding sequentially-named variables based on values of answers

Answers (4)

Related Questions