Anirudh Dutt
Anirudh Dutt

Reputation: 93

Manipulating and tagging the duplicate entries in R dataframe

I have a dataframe which looks like this:

   id Name  Desc 
    1 A     abc
    1 A     abc  
    1 B     def  
    2 C     ghi  
    2 D     jkl  
    3 E     mno  
    4 F     pqr  

I want to identify the duplicate ids and then mark the with duplicate as follows:

id Name  Desc Person
 1 A     abc  Same Person
 1 A     abc  Same Person
 1 B     def  Different Person
 2 C     ghi  Different Person
 2 D     jkl  Different Person
 3 E     mno  Different Person
 4 F     pqr  Different Person

Please help!

Upvotes: 2

Views: 100

Answers (1)

akrun
akrun

Reputation: 887078

We can create a logical vector with duplicated, convert it to numeric index and change the values based on feeding an input vector

df1$Person <- c("Different Person", "Same Person")[(duplicated(df1)|duplicated(df1, 
          fromLast = TRUE)) + 1]

Or with dplyr

library(dplyr)
df1 %>% 
  group_by_all %>%
  mutate(Person = case_when(n() >1 ~ "Same Person", TRUE ~ "Different Person"))

data

df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 4L), Name = c("A", 
"A", "B", "C", "D", "E", "F"), Desc = c("abc", "abc", "def", 
"ghi", "jkl", "mno", "pqr")), class = "data.frame", row.names = c(NA, 
 -7L))

Upvotes: 2

Related Questions