Jason French
Jason French

Reputation: 404

Summing Variable Mismatches

I have a dataset where the participants took a few 12 items twice. I'd like to count the number of times in which V1 != V2 & V3 != V4 and so forth in order to quantify the degree to which they paid attention.

with(data, 'V1' != 'V2') returns a logical TRUE for the entire dataset. I also tried creating a function for this but I can't get it to operate over the different variables. It also seems like I'm reinventing the wheel given the existence of identical().

score.mismatch <- function(data,...) {
mis <- 0
if (data$V1 != data$V2) {
    mis <- mis + 1
    return(mis)
}
if (data$V3 != data$V4) {
    mis <- mis + 1
    return(mis)
} 
    # And so on
return(mis)
}

Thanks for any feedback and tips.

Upvotes: 2

Views: 76

Answers (2)

Ricardo Saporta
Ricardo Saporta

Reputation: 55390

There are two important issues in the code pasted in your question that are likely giving you trouble

First, the quotes around the variable names in your with statement are indicating to compare two literal strings, "V1" and "V2". Whereas without the quotes it would say to compare the objects called V1 and V2 This example might clrify:

  df <- data.frame(V1=11:13, V2=1:3)

   #   df looks like: 
   #     V1 V2
   #   1 11  1
   #   2 12  2
   #   3 13  3

  # CORRECT:  we paste the values within V1 and V2
  with(df, paste(V1, V2, sep="~"))
  [1] "11~1" "12~2" "13~3"

  # INCORRECT:  we paste the strings "V1" and "V2".
  #            There is no connection between them and df
  with(df, paste("V1", "V2", sep="~"))
  [1] "V1~V2"

Second, within your function: Inside each if clause, you have a return statement. That implies that upon hitting a single TRUE value, that the function should cease from continuing. But from what you indicate, I believe that you do not want that behavior.

You likely would want to remove the return statements inside the if clauses, and leave only the last one. Although, even more likely, you would probably want to use @DWin's suggestion ;)

Upvotes: 0

IRTFM
IRTFM

Reputation: 263411

This would give you the same result:

with(data, sum( sum(V1 != V2), sum(V3 != V4) ) )

TRUE is 1 when coerced to numeric. If you want it in a function:

mismat <- function(df){
            mis <- with(df, sum( sum(V1 != V2), sum(V3 != V4) ) ) }

There are some issues that can arise in using with inside functions which I don't entirely understand but I do not think they would arise here unless your argument to mismat() did not have columns with those names.

Upvotes: 2

Related Questions