Syb
Syb

Reputation: 25

Remove duplicates from ONE column not row

I am trying to remove duplicate emails in a column of my data.frame using duplicate() and distinct() in R however, I do not need it to delete the whole row just the duplicate email addresses in that column. Is there anyway to do that using these? Or is there another way to do this?

library(tidyverse)
patient2 <- c('John Doe','Peter Gynn','Jolie Hope', "Mycroft Holmes", "Carrie 
Bird", "Carrie Bird", "Marcus Quimby", "Jennifer Poe", "Donna Moon")
salary2 <- c(21000, 23400, 26800, 40000, 50000, 33000, 24000, 75000, 90000)
email2 <- c("[email protected]", "[email protected]", "[email protected]", 
"[email protected]", "[email protected]", "[email protected]", "[email protected]", 
"[email protected]", "[email protected]")
startdate2 <- as.Date(c('2010-11-1','2008-3-25','2007-3-14', '2020-7-19', 
'2019-4-20', '2018-2-13', '2017-4-21', '2019-6-10', '2010-9-19'))

patient.data_2 <- data.frame(patient2, salary2, email2, startdate2)
print(patient.data_2)


patient2<fctr> salary2<dbl> email2<fctr> startdate2<date>
John Doe       21000    [email protected]       2010-11-01  
Peter Gynn     23400    [email protected]      2008-03-25  
Jolie Hope     26800    [email protected]      2007-03-14  
Mycroft Holmes 40000    [email protected]    2020-07-19  
Carrie Bird    50000    [email protected]      2019-04-20  
Carrie Bird    33000    [email protected]      2018-02-13  
Marcus Quimby  24000    [email protected]    2017-04-21  
Jennifer Poe   75000    [email protected]       2019-06-10  
Donna Moon     90000    [email protected]      2010-09-19    

extracted <- merged_data[!duplicated(merged_data$email), ]
extracted    

All I would like to do is remove the extra duplicate email for the person Carrie Bird. Not the entire row because the date is different. I tried using duplicated() and distinct() and both removed the entire row.

Upvotes: 1

Views: 88

Answers (2)

akrun
akrun

Reputation: 886938

Using dplyr

library(dplyr)
dat <- dat %>% 
      mutate(a = replace(a, duplicated(a), NA))

Upvotes: 1

Vincent
Vincent

Reputation: 17725

You could use the duplicated function:

dat <- data.frame(a = c(1, 1, 2, 2, 3, 3, 4, 4, 4, 4))
dat$a[duplicated(dat$a)] <- NA
dat
#>     a
#> 1   1
#> 2  NA
#> 3   2
#> 4  NA
#> 5   3
#> 6  NA
#> 7   4
#> 8  NA
#> 9  NA
#> 10 NA

Upvotes: 1

Related Questions