Replace duplicates with NAs within a row across columns

Question

I have a data frame in R that looks like this:

ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA 1223
13 f 1.65 5664 4667 NA 4667
15 m 1.78 6663 NA 6663 NA

For each row, I want to only keep the unique variables among the four coordinate.x variables, and the duplicates should be replaced with NAs. The result should look like this:

ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA NA
13 f 1.65 5664 4667 NA NA
15 m 1.78 6663 NA NA NA

Any ideas on how to achieve this?

Ronak Shah · Accepted Answer

Using apply for every row we replace the values which are duplicated with NA.

cols <- grep("^coordinate", names(df))
df[cols] <- t(apply(df[cols], 1, function(x) replace(x, duplicated(x), NA)))

df
#  ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
#1 12   m   1.81         1223           NA           NA           NA
#2 13   f   1.65         5664         4667           NA           NA
#3 15   m   1.78         6663           NA           NA           NA

A tidyverse approach would be by creating a row_number() for every row, gather all coordinate... values group_by the row number (ind), replace duplicates with NA and spread the values again in wide format.

library(tidyverse)

df %>%
  mutate(ind = row_number()) %>%
  gather(key, value, -(c(ind, ID:height))) %>%
  group_by(ind) %>%
  mutate(value = replace(value, duplicated(value), NA)) %>%
  spread(key, value) %>%
  ungroup() %>%
  select(-ind)


#       ID sex   height coordinate.1 coordinate.2 coordinate.3 coordinate.4
#                                        
#1       12 m       1.81         1223           NA           NA           NA
#2       13 f       1.65         5664         4667           NA           NA
#3       15 m       1.78         6663           NA           NA           NA

Replace duplicates with NAs within a row across columns

Answers (2)

Related Questions