Reputation: 730
I'm trying to compare 2 columns, ID
and add
. Using ID as the key, if the corresponding add
are different, diff
should show "Yes."
df <- data.frame(ID = c("1234", "1234", "7491", "7319", "321", "321"), add = c("ABC", "DEF", "HIJ", "KLM", "WXY", "WXY"))
Expected Output
ID add diff
1 1234 ABC Yes
2 1234 DEF Yes
3 7491 HIJ No
4 7319 KLM No
5 321 WXY No
6 321 WXY No
Upvotes: 0
Views: 73
Reputation: 8374
You can have also a dplyr
solution:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(diff = ifelse(length(unique(add))>1, "YES", "NO")) # n_distict(add)>1 will also work
#mutate(diff = ifelse(n_distinct(add)>1, "YES", "NO"))
# # A tibble: 6 x 3
# # Groups: ID [4]
# ID add diff
# <fct> <fct> <chr>
# 1 1234 ABC YES
# 2 1234 DEF YES
# 3 7491 HIJ NO
# 4 7319 KLM NO
# 5 321 WXY NO
# 6 321 WXY NO
Upvotes: 1
Reputation: 73702
A base R approach would be:
df$diff <- sapply(df$ID, function(x) {
s <- df$add[df$ID == x]
length(s) != 1 & length(unique(s)) != 1
})
> df
ID add diff
1 1234 ABC TRUE
2 1234 DEF TRUE
3 7491 HIJ FALSE
4 7319 KLM FALSE
5 321 WXY FALSE
6 321 WXY FALSE
If you're depending on Yes-No do, ifelse(df$diff, "Yes", "No")
.
Or – as @sindri_baldur suggested – do this, which is faster:
unlist(sapply(unique(df$ID), function(x) {
rows <- df$ID == x
s <- df$add[rows]
rep(length(s) != 1 & length(unique(s)) != 1, sum(rows))
}))
Upvotes: 1
Reputation: 33613
Using data.table
:
setDT(df)
df[, diff := if (uniqueN(add) > 1) "Yes" else "No", by = ID]
df
ID add diff
1: 1234 ABC Yes
2: 1234 DEF Yes
3: 7491 HIJ No
4: 7319 KLM No
5: 321 WXY No
6: 321 WXY No
Upvotes: 3