Reputation: 93
I have a dataframe which looks like this:
id Name Desc
1 A abc
1 A abc
1 B def
2 C ghi
2 D jkl
3 E mno
4 F pqr
I want to identify the duplicate ids and then mark the with duplicate as follows:
id Name Desc Person
1 A abc Same Person
1 A abc Same Person
1 B def Different Person
2 C ghi Different Person
2 D jkl Different Person
3 E mno Different Person
4 F pqr Different Person
Please help!
Upvotes: 2
Views: 100
Reputation: 887078
We can create a logical vector with duplicated
, convert it to numeric index and change the values based on feeding an input vector
df1$Person <- c("Different Person", "Same Person")[(duplicated(df1)|duplicated(df1,
fromLast = TRUE)) + 1]
Or with dplyr
library(dplyr)
df1 %>%
group_by_all %>%
mutate(Person = case_when(n() >1 ~ "Same Person", TRUE ~ "Different Person"))
df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 4L), Name = c("A",
"A", "B", "C", "D", "E", "F"), Desc = c("abc", "abc", "def",
"ghi", "jkl", "mno", "pqr")), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 2