Reputation: 195
I want to get the total number of NA
that missmatch between two dataframes.
I have found the way to get this for two vectors as follows:
compareNA <- function(v1,v2) {
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
n <- 0
for (i in 1:length(same))
if (same[i] == "FALSE"){
n <- n+1
}
return(n)
}
Lets say I have vector a
and b
when comparing them I got as a result 2
a <- c(1,2,NA, 4,5,6,NA,8)
b <- c(NA,2,NA, 4,NA,6,NA,8)
h <- compareNA(a,b)
h
[1] 2
My question is: how to apply this function for dataframes instead of vectors?
Having as an example this datafames:
a2 <- c(1,2,NA,NA,NA,6,NA,8)
b2 <- c(1,NA,NA,4,NA,6,NA,NA)
df1 <- data.frame(a,b)
df2 <- data.frame(a2,b2)
what i expect as a result is 5, since this are the total number of NA
s that appear in df2 that are not in df1. Any suggestion how to make this work?
Upvotes: 0
Views: 1174
Reputation: 70633
Here's a second thought.
xy1 <- data.frame(a = c(NA, 2, 3), b = rnorm(3))
xy2 <- data.frame(a = c(NA, 2, 4), b = rnorm(3))
com <- intersect(colnames(xy1), colnames(xy2))
sum(xy1[, com] == xy2[, com], na.rm = TRUE)
If you don't want to worry about column names (but you should), you can make sure the columns align perfectly. In that case, intersect
step is redundant.
sum(xy1 == xy2, na.rm = TRUE)
Upvotes: 2
Reputation: 23099
A third way (assuming dimensions of df1 & df2 are same):
sum(sapply(1:ncol(df1), function(x) compareNA(df1[,x], df2[,x])))
# 5
Upvotes: 0
Reputation: 680
It would be easier to force both dataframes to have the same column names and compare column by column when those have the same name. You can then simply use a loop over columns and increment a running total by applying your function.
compareNA.df <- function(df1, df2) {
total <- 0
common_columns <- intersect(colnames(df1), colnames(df2))
for (col in common_columns) {
total <- total + compareNA(df1[[col]], df2[[col]])
}
return(total)
}
colnames(df2) <- c("a", "b")
compareNA.df(df1, df2)
Upvotes: 0