ltong
ltong

Reputation: 543

R: How to remove duplicated entry across columns within each row

I have a dataframe that looks like the following. Within each row, I would like to remove entries in X1:n that are duplicate entries.

> df <- data.frame(ID = c("100", "101", "102"),
+                  X1 = c("C23.2", "C23.2", "A79.1"), 
+                  X2 = c("C23.2", NA, "A79.1"),
+                  X3 = c("A19.2", NA, "A79.1"))

The output would look something like this

   ID    X2    X3    X4
1 100 C23.2 A19.2  <NA>
2 101 C23.2  <NA>  <NA>
3 102 A79.1  <NA>  <NA>

Upvotes: 0

Views: 74

Answers (2)

akrun
akrun

Reputation: 886938

In base R, use apply to loop over the rows, extract the non-duplicated elements and readjust the length

df[-1] <- t(apply(df[-1], 1, \(x) `length<-`(x[!duplicated(x)], length(x))))

-output

> df
   ID    X1    X2   X3
1 100 C23.2 A19.2 <NA>
2 101 C23.2  <NA> <NA>
3 102 A79.1  <NA> <NA>

Upvotes: 3

Quinten
Quinten

Reputation: 41225

Using pmap_dfr from purrr:

library(dplyr)
library(purrr)
df %>%
  pmap_dfr(., ~c(...) %>% replace(., duplicated(.), NA)) %>%
  bind_cols(select(df), .)

Output:

   ID    X1   X2    X3
1 100 C23.2 <NA> A19.2
2 101 C23.2 <NA>  <NA>
3 102 A79.1 <NA>  <NA>

Upvotes: 2

Related Questions