Reputation: 391
I have a dataframe that includes duplicates across three columns:
Name Year Job1 Job2 Job3
Bob 2011 director director chair
Bob 2012 director chair
Wendy 2011 advisor chair advisor
Henry 2010 CEO president president
I want to remove the duplicates among the columns "job1", "job2" and "job3" in each row:
Name Year Job1 Job2 Job3
Bob 2011 director NA chair
Bob 2012 director chair
Wendy 2011 advisor chair NA
Henry 2010 CEO president NA
Basically, if duplicates exist,the value in the former column stays and the value in the later column is removed (for example, if duplicates exist between "job1" and "job2", the value in "job1" remains).
Upvotes: 2
Views: 28
Reputation: 886938
We can loop over the 'Job' columns rowwise and replace the duplicates with NA
nm1 <- grep('^Job\\d+$', names(df1))
df1[nm1] <- t(apply(df1[nm1], 1, function(x) replace(x, duplicated(x), NA)))
-output
df1
# Name Year Job1 Job2 Job3
#1 Bob 2011 director <NA> chair
#2 Bob 2012 director chair
#3 Wendy 2011 advisor chair <NA>
#4 Henry 2010 CEO president <NA>
df1 <- structure(list(Name = c("Bob", "Bob", "Wendy", "Henry"), Year = c(2011L,
2012L, 2011L, 2010L), Job1 = c("director", "director", "advisor",
"CEO"), Job2 = c("director", "chair", "chair", "president"),
Job3 = c("chair", "", "advisor", "president")),
class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 2