gojomoso
gojomoso

Reputation: 163

Evaluate if a Value Does Not Exists in Another Table

df1:

a = c(2, 3, 5, 8, 10, 12) 
b = c("NA", "bb", "cc", "aa", "bb", "aa") 
c = c("bb", "aa", "bb", "cc", "aa", "aa")
d = c("aa", "cc", "bb", "aa", "aa", "aa")
e = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE) 
df1 = data.frame(a, b, c, d, e)

df2:

f = c("aa", "bb")
df2 = data.frame(f)

I want to evaluate df1 columns b,c,d to see if the value is not in df2. If it is, then keep the value. If it's not then input "Rare". Needs to ignore NAs.

Output:

a  b    c    d    e
2  NA   bb   aa   true
3  bb   aa   rare false
5  rare bb   bb   true
8  aa   rare aa   false
10 bb   aa   aa   true
12 aa   aa   aa   false

Upvotes: 3

Views: 73

Answers (3)

Sotos
Sotos

Reputation: 51592

Make sure that your columns are as.character(), then (and considering your NA is a string just as you define it)

df1[2:4] <- lapply(df1[2:4], function(i)replace(i, !i %in% c('NA', df2$f), 'rare'))

df1
#   a    b    c    d     e
#1  2   NA   bb   aa  TRUE
#2  3   bb   aa rare FALSE
#3  5 rare   bb   bb  TRUE
#4  8   aa rare   aa FALSE
#5 10   bb   aa   aa  TRUE
#6 12   aa   aa   aa FALSE

Upvotes: 4

Edo
Edo

Reputation: 7818

Another solution with dplyr, but it uses where instead.

library(dplyr) # version >=1.0.0
df1 %>% mutate(across(where(is.character), ~ifelse((. %in% df2$f )| is.na(.), ., "rare")))
#>    a    b    c    d     e
#> 1  2 <NA>   bb   aa  TRUE
#> 2  3   bb   aa rare FALSE
#> 3  5 rare   bb   bb  TRUE
#> 4  8   aa rare   aa FALSE
#> 5 10   bb   aa   aa  TRUE
#> 6 12   aa   aa   aa FALSE

Upvotes: 1

Duck
Duck

Reputation: 39595

Try this:

library(dplyr)
#Data
a = c(2, 3, 5, 8, 10, 12) 
b = c(NA, "bb", "cc", "aa", "bb", "aa") 
c = c("bb", "aa", "bb", "cc", "aa", "aa")
d = c("aa", "cc", "bb", "aa", "aa", "aa")
e = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE) 
df1 = data.frame(a, b, c, d, e, stringsAsFactors = F)
#Data2
f = c("aa", "bb")
df2 = data.frame(f,stringsAsFactors = F)
#Code
df1 %>% mutate(across(c(b:d), ~ ifelse(.%in% df2$f | is.na(.), ., 'rare')))

Output:

   a    b    c    d     e
1  2 <NA>   bb   aa  TRUE
2  3   bb   aa rare FALSE
3  5 rare   bb   bb  TRUE
4  8   aa rare   aa FALSE
5 10   bb   aa   aa  TRUE
6 12   aa   aa   aa FALSE

Upvotes: 3

Related Questions