Compare values in data frame and return results

Question

I have two data frames that I need to compare and generate an output of the comparison results. The dimensions are identical. Column and row orders match. I would like to compare each corresponding cell between the two data frames and determine whether they contain the same value or a different value. If the value is different, I need to know if both values belong to a particular vector I define or if they come from 2 different vectors. I've provided example code below.

I haven't been able to find anything in the forums that does exactly what I need, mainly because I need to know when the values are different, how different are they based on criteria I provide.

#Possible Value Types for the Data Frames
typeA = c("Green", "Blue", "Purple")
typeB = c("Red", "Orange", "Yellow")

#Create Data Frames to Compare
df1 = as.data.frame(cbind(rbind("Green","Red","Yellow"), 
            rbind("Green", "Purple", "Red"), 
            rbind("Orange", "Orange",NA), 
            rbind(NA,"Red","Purple")))

df2 = as.data.frame(cbind(rbind("Green","Red","Yellow"), 
                          rbind(NA, "Purple", "Yellow"), 
                          rbind("Blue", "Orange",NA), 
                          rbind("Blue","Red","Green")))

#Data frames compared must have identical dimensions
###INSERT FUNCTION HERE
myfunction = function(df1,df2){
  #compare corresponding cells and provide output based on match
  #example: compare cell df1[1,1] to df2[1,1]
  #if either df1[1,1] or df2[1,1] is NA then return NA, else...
    #if df1[1,1] matches df2[1,1] then return "Match"
    #if df1[1,1] does not match df2[1,1] but they are both in vector typeB then return "SAMEGROUP"
    #if df1[1,1] does not match df2[1,1] and one is in vector typeA and the other in typeB then return "DIFFGROUP"
}

###RUN FUNCTION
df.out = myfunction(df1,df2)

#expected output
#Match: The values in df1 and df2 for that cell are identical
#SAMEGROUP: The values in df1 and df2 for that cell are different, but
##they come from the same group (typeA or typeB)
#DIFFGROUP: The values in df1 and df2 for that cell are different, and
##they come from different groups (one from typeA, one from typeB)
#NA: One or both of the corresponding cells in df1 or df2 has an NA

df.out = as.data.frame(cbind(rbind("Match","Match","Match"), 
                          rbind(NA, "Match", "SAMEGROUP"), 
                          rbind("DIFFGROUP", "Match",NA), 
                          rbind(NA,"Match","SAMEGROUP")))

Thank you!

jarfa · Accepted Answer

First, to enforce your dimensionality condition:

stopifnot(all.equal(dim(df1), dim(df2)))

For the meat of your function: a naive, slow approach would be something like:

for(i in 1:dim(df1)[1])
  for(j in 1:dim(df1)[2])
    #complicated ifelse statement(s)

But this is easily vectorized. See:

a = matrix(1:9, 3)
b = matrix(c(1:8, -1),3)
ifelse(a == b, 'match', 'nomatch')

Your if/else would definitely more complicated, but I think you can figure out from there. It will be some assortment of nested ifelse() functions

Edit: Make a function that will return the group of a given value. Then, the statement

groupfun(a) == groupfun(b)

should just return a matrix of TRUES and FALSES, which will be easy to use.

Compare values in data frame and return results

Answers (2)

Related Questions