Reputation: 51
The following question is about detecting if a row has duplicate cell. I would like to ask how to remove the 2nd, 3rd, .... duplicates (keeping only one cell)
Removing duplicate values row-wise in R
Example; team #6595380 (4th row) has a repeat member - person #24432987 appears as in the Person1 column and the Person5 column.
Team Person1 Person2 Person3 Person4 Person5 Person6 Person7
6594794 37505959 37469784 NA NA NA NA NA
6595053 30113392 33080042 21537147 32293683 NA NA NA
6595201 697417 22860111 NA NA NA NA NA
6595380 24432987 32370372 11521625 362790 24432987 22312802 32432267
6595382 12317669 25645492 NA NA NA NA NA
6595444 8114419 236357 32545314 22247108 NA NA NA
6595459 2135269 32332907 32332907 32436550 NA NA NA
6595468 33590928 10905322 32319555 10439608 NA NA NA
6595485 33080810 33162061 NA NA NA NA NA
6595496 36901773 34931641 NA NA NA NA NA
6595523 512193 8747403 NA NA NA NA NA
6595524 32393404 113514 NA NA NA NA NA
6595526 37855554 37855512 NA NA NA NA NA
6595536 18603977 1882599 332261 10969771 712339 2206680 768785
Upvotes: 3
Views: 94
Reputation: 78917
With base R: Get the indices of the duplicates and replace them NA.
dat[which(t(apply(dat,1,function(x) duplicated(x))), arr.ind = T)] <- "NA"
Team Person1 Person2 Person3 Person4 Person5 Person6 Person7
1 6594794 37505959 37469784 <NA> NA NA NA NA
2 6595053 30113392 33080042 21537147 32293683 <NA> NA NA
3 6595201 697417 22860111 <NA> NA NA NA NA
4 6595380 24432987 32370372 11521625 362790 NA 22312802 32432267
5 6595382 12317669 25645492 <NA> NA NA NA NA
6 6595444 8114419 236357 32545314 22247108 <NA> NA NA
7 6595459 2135269 32332907 NA 32436550 <NA> NA NA
8 6595468 33590928 10905322 32319555 10439608 <NA> NA NA
9 6595485 33080810 33162061 <NA> NA NA NA NA
10 6595496 36901773 34931641 <NA> NA NA NA NA
11 6595523 512193 8747403 <NA> NA NA NA NA
12 6595524 32393404 113514 <NA> NA NA NA NA
13 6595526 37855554 37855512 <NA> NA NA NA NA
14 6595536 18603977 1882599 332261 10969771 712339 2206680 768785
Upvotes: 3
Reputation: 25313
If your goal is to transform the duplicated into NA
, then you can use duplicated
to, as a logical value, identify the duplicated rows (after pivot_longer
):
library(tidyverse)
df %>%
pivot_longer(cols = -Team) %>%
mutate(value = if_else(duplicated(cbind(Team, value)), NA_integer_, value)) %>%
pivot_wider(Team, names_from = name, names_sort = T)
#> # A tibble: 14 × 8
#> Team Person1 Person2 Person3 Person4 Person5 Person6 Person7
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 6594794 37505959 37469784 NA NA NA NA NA
#> 2 6595053 30113392 33080042 21537147 32293683 NA NA NA
#> 3 6595201 697417 22860111 NA NA NA NA NA
#> 4 6595380 24432987 32370372 11521625 362790 NA 22312802 32432267
#> 5 6595382 12317669 25645492 NA NA NA NA NA
#> 6 6595444 8114419 236357 32545314 22247108 NA NA NA
#> 7 6595459 2135269 32332907 NA 32436550 NA NA NA
#> 8 6595468 33590928 10905322 32319555 10439608 NA NA NA
#> 9 6595485 33080810 33162061 NA NA NA NA NA
#> 10 6595496 36901773 34931641 NA NA NA NA NA
#> 11 6595523 512193 8747403 NA NA NA NA NA
#> 12 6595524 32393404 113514 NA NA NA NA NA
#> 13 6595526 37855554 37855512 NA NA NA NA NA
#> 14 6595536 18603977 1882599 332261 10969771 712339 2206680 768785
If the goal is to remove the repeated cell:
library(tidyverse)
df %>%
pivot_longer(cols = -Team, values_drop_na = T) %>%
distinct(Team, value, .keep_all = TRUE) %>%
pivot_wider(Team, names_from = name)
#> # A tibble: 14 × 8
#> Team Person1 Person2 Person3 Person4 Person6 Person7 Person5
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 6594794 37505959 37469784 NA NA NA NA NA
#> 2 6595053 30113392 33080042 21537147 32293683 NA NA NA
#> 3 6595201 697417 22860111 NA NA NA NA NA
#> 4 6595380 24432987 32370372 11521625 362790 22312802 32432267 NA
#> 5 6595382 12317669 25645492 NA NA NA NA NA
#> 6 6595444 8114419 236357 32545314 22247108 NA NA NA
#> 7 6595459 2135269 32332907 NA 32436550 NA NA NA
#> 8 6595468 33590928 10905322 32319555 10439608 NA NA NA
#> 9 6595485 33080810 33162061 NA NA NA NA NA
#> 10 6595496 36901773 34931641 NA NA NA NA NA
#> 11 6595523 512193 8747403 NA NA NA NA NA
#> 12 6595524 32393404 113514 NA NA NA NA NA
#> 13 6595526 37855554 37855512 NA NA NA NA NA
#> 14 6595536 18603977 1882599 332261 10969771 2206680 768785 712339
Upvotes: 3
Reputation: 19088
Try this base R using apply
data.frame(Team=df1$Team, t(apply(df1[,-1], 1, function(x)
ifelse(!is.na(x)&duplicated(as.vector(x)),NA,x))))
Team Person1 Person2 Person3 Person4 Person5 Person6 Person7
1 6594794 37505959 37469784 NA NA NA NA NA
2 6595053 30113392 33080042 21537147 32293683 NA NA NA
3 6595201 697417 22860111 NA NA NA NA NA
4 6595380 24432987 32370372 11521625 362790 NA 22312802 32432267
5 6595382 12317669 25645492 NA NA NA NA NA
6 6595444 8114419 236357 32545314 22247108 NA NA NA
7 6595459 2135269 32332907 NA 32436550 NA NA NA
8 6595468 33590928 10905322 32319555 10439608 NA NA NA
9 6595485 33080810 33162061 NA NA NA NA NA
10 6595496 36901773 34931641 NA NA NA NA NA
11 6595523 512193 8747403 NA NA NA NA NA
12 6595524 32393404 113514 NA NA NA NA NA
13 6595526 37855554 37855512 NA NA NA NA NA
14 6595536 18603977 1882599 332261 10969771 712339 2206680 768785
Upvotes: 1