Javier
Javier

Reputation: 730

Comparing Differences in 2 Columns

I'm trying to compare 2 columns, ID and add. Using ID as the key, if the corresponding add are different, diff should show "Yes."

df <- data.frame(ID = c("1234", "1234", "7491", "7319", "321", "321"), add = c("ABC", "DEF", "HIJ", "KLM", "WXY", "WXY"))

Expected Output

    ID add diff
1 1234 ABC  Yes
2 1234 DEF  Yes
3 7491 HIJ   No
4 7319 KLM   No
5  321 WXY   No
6  321 WXY   No

Upvotes: 0

Views: 73

Answers (3)

RLave
RLave

Reputation: 8374

You can have also a dplyr solution:

library(dplyr)

df %>% 
  group_by(ID) %>% 
  mutate(diff = ifelse(length(unique(add))>1, "YES", "NO")) # n_distict(add)>1 will also work 
  #mutate(diff = ifelse(n_distinct(add)>1, "YES", "NO"))
# # A tibble: 6 x 3
# # Groups:   ID [4]
# ID    add   diff 
# <fct> <fct> <chr>
# 1 1234  ABC   YES  
# 2 1234  DEF   YES  
# 3 7491  HIJ   NO   
# 4 7319  KLM   NO   
# 5 321   WXY   NO   
# 6 321   WXY   NO

Upvotes: 1

jay.sf
jay.sf

Reputation: 73702

A base R approach would be:

df$diff <- sapply(df$ID, function(x) {
  s <- df$add[df$ID == x]
  length(s) != 1 & length(unique(s)) != 1
})

> df
    ID add  diff
1 1234 ABC  TRUE
2 1234 DEF  TRUE
3 7491 HIJ FALSE
4 7319 KLM FALSE
5  321 WXY FALSE
6  321 WXY FALSE

If you're depending on Yes-No do, ifelse(df$diff, "Yes", "No").

Or – as @sindri_baldur suggested – do this, which is faster:

unlist(sapply(unique(df$ID), function(x) {
  rows <- df$ID == x
  s <- df$add[rows]
  rep(length(s) != 1 & length(unique(s)) != 1, sum(rows)) 
}))

Upvotes: 1

s_baldur
s_baldur

Reputation: 33613

Using data.table:

setDT(df)
df[, diff := if (uniqueN(add) > 1) "Yes" else "No", by = ID]
df

     ID add diff
1: 1234 ABC  Yes
2: 1234 DEF  Yes
3: 7491 HIJ   No
4: 7319 KLM   No
5:  321 WXY   No
6:  321 WXY   No

Upvotes: 3

Related Questions