mina
mina

Reputation: 195

apply a function with two dataframes as input in r

I want to get the total number of NA that missmatch between two dataframes. I have found the way to get this for two vectors as follows:

compareNA <- function(v1,v2) {
  same <- (v1 == v2) | (is.na(v1) & is.na(v2))
  same[is.na(same)] <- FALSE
  n <- 0
  for (i in 1:length(same))
  if (same[i] == "FALSE"){
    n <- n+1
  }
  return(n)
}

Lets say I have vector aand bwhen comparing them I got as a result 2

 a <- c(1,2,NA, 4,5,6,NA,8)
 b <- c(NA,2,NA, 4,NA,6,NA,8)
 h <- compareNA(a,b)
 h
[1] 2

My question is: how to apply this function for dataframes instead of vectors?

Having as an example this datafames:

a2 <- c(1,2,NA,NA,NA,6,NA,8)
b2 <- c(1,NA,NA,4,NA,6,NA,NA)

df1 <- data.frame(a,b)
df2 <- data.frame(a2,b2)

what i expect as a result is 5, since this are the total number of NAs that appear in df2 that are not in df1. Any suggestion how to make this work?

Upvotes: 0

Views: 1174

Answers (3)

Roman Luštrik
Roman Luštrik

Reputation: 70633

Here's a second thought.

xy1 <- data.frame(a = c(NA, 2, 3), b = rnorm(3))
xy2 <- data.frame(a = c(NA, 2, 4), b = rnorm(3))

com <- intersect(colnames(xy1), colnames(xy2))

sum(xy1[, com] == xy2[, com], na.rm = TRUE)

If you don't want to worry about column names (but you should), you can make sure the columns align perfectly. In that case, intersect step is redundant.

sum(xy1 == xy2, na.rm = TRUE)

Upvotes: 2

Sandipan Dey
Sandipan Dey

Reputation: 23099

A third way (assuming dimensions of df1 & df2 are same):

sum(sapply(1:ncol(df1), function(x) compareNA(df1[,x], df2[,x])))
# 5

Upvotes: 0

Choubi
Choubi

Reputation: 680

It would be easier to force both dataframes to have the same column names and compare column by column when those have the same name. You can then simply use a loop over columns and increment a running total by applying your function.

compareNA.df <- function(df1, df2) {

   total <- 0
   common_columns <- intersect(colnames(df1), colnames(df2))

   for (col in common_columns) {

      total <- total + compareNA(df1[[col]], df2[[col]])

   }
   return(total)
}

colnames(df2) <- c("a", "b")

compareNA.df(df1, df2)

Upvotes: 0

Related Questions