Remove duplicates from ONE column not row

Question

I am trying to remove duplicate emails in a column of my data.frame using duplicate() and distinct() in R however, I do not need it to delete the whole row just the duplicate email addresses in that column. Is there anyway to do that using these? Or is there another way to do this?

library(tidyverse)

patient2 <- c('John Doe','Peter Gynn','Jolie Hope', "Mycroft Holmes", "Carrie 
Bird", "Carrie Bird", "Marcus Quimby", "Jennifer Poe", "Donna Moon")
salary2 <- c(21000, 23400, 26800, 40000, 50000, 33000, 24000, 75000, 90000)
email2 <- c("doe@gmail.com", "gynn@gmail.com", "hope@gmail.com", 
"holmes@gmail.com", "bird@gmail.com", "bird@gmail.com", "quimby@gmail.com", 
"poe@gmail.com", "moon@gmail.com")
startdate2 <- as.Date(c('2010-11-1','2008-3-25','2007-3-14', '2020-7-19', 
'2019-4-20', '2018-2-13', '2017-4-21', '2019-6-10', '2010-9-19'))

patient.data_2 <- data.frame(patient2, salary2, email2, startdate2)
print(patient.data_2)


patient2 salary2 email2 startdate2
John Doe       21000    doe@gmail.com       2010-11-01  
Peter Gynn     23400    gynn@gmail.com      2008-03-25  
Jolie Hope     26800    hope@gmail.com      2007-03-14  
Mycroft Holmes 40000    holmes@gmail.com    2020-07-19  
Carrie Bird    50000    bird@gmail.com      2019-04-20  
Carrie Bird    33000    bird@gmail.com      2018-02-13  
Marcus Quimby  24000    quimby@gmail.com    2017-04-21  
Jennifer Poe   75000    poe@gmail.com       2019-06-10  
Donna Moon     90000    moon@gmail.com      2010-09-19    

extracted <- merged_data[!duplicated(merged_data$email), ]
extracted

All I would like to do is remove the extra duplicate email for the person Carrie Bird. Not the entire row because the date is different. I tried using duplicated() and distinct() and both removed the entire row.

Vincent · Accepted Answer

You could use the duplicated function:

dat <- data.frame(a = c(1, 1, 2, 2, 3, 3, 4, 4, 4, 4))
dat$a[duplicated(dat$a)] <- NA
dat
#>     a
#> 1   1
#> 2  NA
#> 3   2
#> 4  NA
#> 5   3
#> 6  NA
#> 7   4
#> 8  NA
#> 9  NA
#> 10 NA

Remove duplicates from ONE column not row

Answers (2)

Related Questions