EML
EML

Reputation: 671

Check if values in cell and column match values in a list

For several hundred columns, I would like to check if values in a given column of df match values in list of df2.

Example data:

df <- data.frame(a=c("1","2","3","4"), b=c("1", NA,NA, "99"))
df$a <- as.character(df$a)
df$b <- as.character(df$b)
df2 <- data.frame(c=I(list(c("1","2","3"))), d=I(list(c("1","0"))))

> df
  a    b
1 1    1
2 2 <NA>
3 3 <NA>
4 4   99

> df2
        c    d
1 1, 2, 3 1, 0

I have tried the following function:

check <- function(dat1=df, dat2=df2) {
for(c in ncol(df)) {
 for(r in nrow(df)) {
   df[r,c] <- ifelse(df[r,c] %in% as.character(unlist(df2[1,c])),"match", "nomatch")
    }
}
return(df)
}
check(df, df2)

Output:

> check(df, df2)
  a       b
1 1       1
2 2    <NA>
3 3    <NA>
4 4 nomatch

Desired output:

> check(df, df2)
  a       b
1 match    match
2 match    <NA>
3 match    <NA>
4 nomatch nomatch

Upvotes: 0

Views: 298

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388972

You can use Map/mapply :

mat <- mapply(function(x, y) ifelse(is.na(x), NA, x %in% unlist(y)), df, df2)
#You can also use replace similarly
#mat <- mapply(function(x, y) replace(x %in% unlist(y), is.na(x), NA), df, df2)
mat
#         a     b
#[1,]  TRUE  TRUE
#[2,]  TRUE    NA
#[3,]  TRUE    NA
#[4,] FALSE FALSE

Now turn these TRUE/FALSE values to "Match"/"No Match" if needed.

mat[] <- c('No match', 'match')[mat + 1]

#       a          b         
#[1,] "match"    "match"   
#[2,] "match"    NA        
#[3,] "match"    NA        
#[4,] "No match" "No match"

Upvotes: 1

coffeinjunky
coffeinjunky

Reputation: 11514

Try

check <- function(dat1, dat2){
  out <- ifelse(t(apply(dat1, 1, function(row) row %in% unlist(dat2))), "match", 'nomatch')
  out[is.na(dat1)] <- NA
  colnames(out) <- colnames(dat1)
  return(as.data.frame(out))
}

check(df, df2)  

check(df, df2)  
        a       b
1   match   match
2   match    <NA>
3   match    <NA>
4 nomatch nomatch

Upvotes: 1

Mohanasundaram
Mohanasundaram

Reputation: 2949

You should call the range of columns and rows not just the number of columns and rows. Also, you need to include the ifelse() for NA values.

check <- function(dat1=df, dat2=df2) {
  for(c in 2:ncol(df)) {
    for(r in 1:nrow(df)) {
      df[r,c] <- ifelse(df[r,c] == "<NA>", "<NA>", 
                        ifelse(df[r,c] %in% as.character(unlist(df2[1,c])),
                               "match", "nomatch"))
    }
  }
  return(df)
}

> check(df, df2)
  a       b
1 1   match
2 2    <NA>
3 3    <NA>
4 4 nomatch

Upvotes: 1

Related Questions