Abdel
Abdel

Reputation: 6106

Replace duplicates with NAs within a row across columns

I have a data frame in R that looks like this:

ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA 1223
13 f 1.65 5664 4667 NA 4667
15 m 1.78 6663 NA 6663 NA

For each row, I want to only keep the unique variables among the four coordinate.x variables, and the duplicates should be replaced with NAs. The result should look like this:

ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA NA
13 f 1.65 5664 4667 NA NA
15 m 1.78 6663 NA NA NA

Any ideas on how to achieve this?

Upvotes: 0

Views: 59

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

Using apply for every row we replace the values which are duplicated with NA.

cols <- grep("^coordinate", names(df))
df[cols] <- t(apply(df[cols], 1, function(x) replace(x, duplicated(x), NA)))

df
#  ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
#1 12   m   1.81         1223           NA           NA           NA
#2 13   f   1.65         5664         4667           NA           NA
#3 15   m   1.78         6663           NA           NA           NA

A tidyverse approach would be by creating a row_number() for every row, gather all coordinate... values group_by the row number (ind), replace duplicates with NA and spread the values again in wide format.

library(tidyverse)

df %>%
  mutate(ind = row_number()) %>%
  gather(key, value, -(c(ind, ID:height))) %>%
  group_by(ind) %>%
  mutate(value = replace(value, duplicated(value), NA)) %>%
  spread(key, value) %>%
  ungroup() %>%
  select(-ind)


#       ID sex   height coordinate.1 coordinate.2 coordinate.3 coordinate.4
#     <int> <fct>  <dbl>        <int>        <int>        <int>        <int>
#1       12 m       1.81         1223           NA           NA           NA
#2       13 f       1.65         5664         4667           NA           NA
#3       15 m       1.78         6663           NA           NA           NA

Upvotes: 1

Sotos
Sotos

Reputation: 51592

Another idea for fun avoiding the apply(..., margin = 1, ..)

library(tidyverse)

stack(df[-c(1:3)]) %>% 
 mutate(values = replace(values, duplicated(values), NA)) %>% 
 unstack() %>% 
 bind_cols(df[c(1:3)], .)

which gives,

  ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
1 12   m   1.81         1223           NA           NA           NA
2 13   f   1.65         5664         4667           NA           NA
3 15   m   1.78         6663           NA           NA           NA

Upvotes: 1

Related Questions