m26348762
m26348762

Reputation: 51

Remove duplicate cell in a row

The following question is about detecting if a row has duplicate cell. I would like to ask how to remove the 2nd, 3rd, .... duplicates (keeping only one cell)

Removing duplicate values row-wise in R

Example; team #6595380 (4th row) has a repeat member - person #24432987 appears as in the Person1 column and the Person5 column.

Team    Person1   Person2   Person3   Person4   Person5  Person6  Person7
6594794 37505959  37469784    NA         NA       NA        NA      NA
6595053 30113392  33080042  21537147  32293683    NA        NA      NA
6595201 697417    22860111  NA           NA       NA        NA      NA
6595380 24432987  32370372  11521625   362790   24432987 22312802 32432267
6595382 12317669  25645492  NA           NA       NA        NA      NA
6595444 8114419   236357    32545314  22247108    NA        NA      NA
6595459 2135269   32332907  32332907  32436550    NA        NA      NA
6595468 33590928  10905322  32319555  10439608    NA        NA      NA
6595485 33080810  33162061  NA           NA       NA        NA      NA
6595496 36901773  34931641  NA           NA       NA        NA      NA
6595523 512193    8747403   NA           NA       NA        NA      NA
6595524 32393404  113514    NA           NA       NA        NA      NA
6595526 37855554  37855512  NA           NA       NA        NA      NA
6595536 18603977  1882599   332261    10969771  712339  2206680  768785

Upvotes: 3

Views: 94

Answers (3)

TarJae
TarJae

Reputation: 78917

With base R: Get the indices of the duplicates and replace them NA.

dat[which(t(apply(dat,1,function(x) duplicated(x))), arr.ind = T)] <- "NA"
      Team  Person1  Person2  Person3  Person4 Person5  Person6  Person7
1  6594794 37505959 37469784     <NA>       NA      NA       NA       NA
2  6595053 30113392 33080042 21537147 32293683    <NA>       NA       NA
3  6595201   697417 22860111     <NA>       NA      NA       NA       NA
4  6595380 24432987 32370372 11521625   362790      NA 22312802 32432267
5  6595382 12317669 25645492     <NA>       NA      NA       NA       NA
6  6595444  8114419   236357 32545314 22247108    <NA>       NA       NA
7  6595459  2135269 32332907       NA 32436550    <NA>       NA       NA
8  6595468 33590928 10905322 32319555 10439608    <NA>       NA       NA
9  6595485 33080810 33162061     <NA>       NA      NA       NA       NA
10 6595496 36901773 34931641     <NA>       NA      NA       NA       NA
11 6595523   512193  8747403     <NA>       NA      NA       NA       NA
12 6595524 32393404   113514     <NA>       NA      NA       NA       NA
13 6595526 37855554 37855512     <NA>       NA      NA       NA       NA
14 6595536 18603977  1882599   332261 10969771  712339  2206680   768785

Upvotes: 3

PaulS
PaulS

Reputation: 25313

If your goal is to transform the duplicated into NA, then you can use duplicated to, as a logical value, identify the duplicated rows (after pivot_longer):

library(tidyverse)

df %>% 
  pivot_longer(cols = -Team) %>% 
  mutate(value = if_else(duplicated(cbind(Team, value)), NA_integer_, value)) %>% 
  pivot_wider(Team, names_from = name, names_sort = T)

#> # A tibble: 14 × 8
#>       Team  Person1  Person2  Person3  Person4 Person5  Person6  Person7
#>      <int>    <int>    <int>    <int>    <int>   <int>    <int>    <int>
#>  1 6594794 37505959 37469784       NA       NA      NA       NA       NA
#>  2 6595053 30113392 33080042 21537147 32293683      NA       NA       NA
#>  3 6595201   697417 22860111       NA       NA      NA       NA       NA
#>  4 6595380 24432987 32370372 11521625   362790      NA 22312802 32432267
#>  5 6595382 12317669 25645492       NA       NA      NA       NA       NA
#>  6 6595444  8114419   236357 32545314 22247108      NA       NA       NA
#>  7 6595459  2135269 32332907       NA 32436550      NA       NA       NA
#>  8 6595468 33590928 10905322 32319555 10439608      NA       NA       NA
#>  9 6595485 33080810 33162061       NA       NA      NA       NA       NA
#> 10 6595496 36901773 34931641       NA       NA      NA       NA       NA
#> 11 6595523   512193  8747403       NA       NA      NA       NA       NA
#> 12 6595524 32393404   113514       NA       NA      NA       NA       NA
#> 13 6595526 37855554 37855512       NA       NA      NA       NA       NA
#> 14 6595536 18603977  1882599   332261 10969771  712339  2206680   768785

If the goal is to remove the repeated cell:

library(tidyverse)

df %>% 
  pivot_longer(cols = -Team, values_drop_na = T) %>% 
  distinct(Team, value, .keep_all = TRUE) %>% 
  pivot_wider(Team, names_from = name)

#> # A tibble: 14 × 8
#>       Team  Person1  Person2  Person3  Person4  Person6  Person7 Person5
#>      <int>    <int>    <int>    <int>    <int>    <int>    <int>   <int>
#>  1 6594794 37505959 37469784       NA       NA       NA       NA      NA
#>  2 6595053 30113392 33080042 21537147 32293683       NA       NA      NA
#>  3 6595201   697417 22860111       NA       NA       NA       NA      NA
#>  4 6595380 24432987 32370372 11521625   362790 22312802 32432267      NA
#>  5 6595382 12317669 25645492       NA       NA       NA       NA      NA
#>  6 6595444  8114419   236357 32545314 22247108       NA       NA      NA
#>  7 6595459  2135269 32332907       NA 32436550       NA       NA      NA
#>  8 6595468 33590928 10905322 32319555 10439608       NA       NA      NA
#>  9 6595485 33080810 33162061       NA       NA       NA       NA      NA
#> 10 6595496 36901773 34931641       NA       NA       NA       NA      NA
#> 11 6595523   512193  8747403       NA       NA       NA       NA      NA
#> 12 6595524 32393404   113514       NA       NA       NA       NA      NA
#> 13 6595526 37855554 37855512       NA       NA       NA       NA      NA
#> 14 6595536 18603977  1882599   332261 10969771  2206680   768785  712339

Upvotes: 3

Andre Wildberg
Andre Wildberg

Reputation: 19088

Try this base R using apply

data.frame(Team=df1$Team, t(apply(df1[,-1], 1, function(x)
  ifelse(!is.na(x)&duplicated(as.vector(x)),NA,x))))
      Team  Person1  Person2  Person3  Person4 Person5  Person6  Person7
1  6594794 37505959 37469784       NA       NA      NA       NA       NA
2  6595053 30113392 33080042 21537147 32293683      NA       NA       NA
3  6595201   697417 22860111       NA       NA      NA       NA       NA
4  6595380 24432987 32370372 11521625   362790      NA 22312802 32432267
5  6595382 12317669 25645492       NA       NA      NA       NA       NA
6  6595444  8114419   236357 32545314 22247108      NA       NA       NA
7  6595459  2135269 32332907       NA 32436550      NA       NA       NA
8  6595468 33590928 10905322 32319555 10439608      NA       NA       NA
9  6595485 33080810 33162061       NA       NA      NA       NA       NA
10 6595496 36901773 34931641       NA       NA      NA       NA       NA
11 6595523   512193  8747403       NA       NA      NA       NA       NA
12 6595524 32393404   113514       NA       NA      NA       NA       NA
13 6595526 37855554 37855512       NA       NA      NA       NA       NA
14 6595536 18603977  1882599   332261 10969771  712339  2206680   768785

Upvotes: 1

Related Questions