Reputation: 25
I am trying to remove duplicate emails in a column of my data.frame using duplicate() and distinct() in R however, I do not need it to delete the whole row just the duplicate email addresses in that column. Is there anyway to do that using these? Or is there another way to do this?
library(tidyverse)
patient2 <- c('John Doe','Peter Gynn','Jolie Hope', "Mycroft Holmes", "Carrie
Bird", "Carrie Bird", "Marcus Quimby", "Jennifer Poe", "Donna Moon")
salary2 <- c(21000, 23400, 26800, 40000, 50000, 33000, 24000, 75000, 90000)
email2 <- c("[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]")
startdate2 <- as.Date(c('2010-11-1','2008-3-25','2007-3-14', '2020-7-19',
'2019-4-20', '2018-2-13', '2017-4-21', '2019-6-10', '2010-9-19'))
patient.data_2 <- data.frame(patient2, salary2, email2, startdate2)
print(patient.data_2)
patient2<fctr> salary2<dbl> email2<fctr> startdate2<date>
John Doe 21000 [email protected] 2010-11-01
Peter Gynn 23400 [email protected] 2008-03-25
Jolie Hope 26800 [email protected] 2007-03-14
Mycroft Holmes 40000 [email protected] 2020-07-19
Carrie Bird 50000 [email protected] 2019-04-20
Carrie Bird 33000 [email protected] 2018-02-13
Marcus Quimby 24000 [email protected] 2017-04-21
Jennifer Poe 75000 [email protected] 2019-06-10
Donna Moon 90000 [email protected] 2010-09-19
extracted <- merged_data[!duplicated(merged_data$email), ]
extracted
All I would like to do is remove the extra duplicate email for the person Carrie Bird. Not the entire row because the date is different. I tried using duplicated() and distinct() and both removed the entire row.
Upvotes: 1
Views: 88
Reputation: 886938
Using dplyr
library(dplyr)
dat <- dat %>%
mutate(a = replace(a, duplicated(a), NA))
Upvotes: 1
Reputation: 17725
You could use the duplicated
function:
dat <- data.frame(a = c(1, 1, 2, 2, 3, 3, 4, 4, 4, 4))
dat$a[duplicated(dat$a)] <- NA
dat
#> a
#> 1 1
#> 2 NA
#> 3 2
#> 4 NA
#> 5 3
#> 6 NA
#> 7 4
#> 8 NA
#> 9 NA
#> 10 NA
Upvotes: 1