Reputation: 6106
I have a data frame in R that looks like this:
ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA 1223
13 f 1.65 5664 4667 NA 4667
15 m 1.78 6663 NA 6663 NA
For each row, I want to only keep the unique variables among the four coordinate.x variables, and the duplicates should be replaced with NAs. The result should look like this:
ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA NA
13 f 1.65 5664 4667 NA NA
15 m 1.78 6663 NA NA NA
Any ideas on how to achieve this?
Upvotes: 0
Views: 59
Reputation: 388982
Using apply
for every row we replace
the values which are duplicated
with NA
.
cols <- grep("^coordinate", names(df))
df[cols] <- t(apply(df[cols], 1, function(x) replace(x, duplicated(x), NA)))
df
# ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
#1 12 m 1.81 1223 NA NA NA
#2 13 f 1.65 5664 4667 NA NA
#3 15 m 1.78 6663 NA NA NA
A tidyverse
approach would be by creating a row_number()
for every row, gather
all coordinate...
values group_by
the row number (ind
), replace
duplicates with NA
and spread
the values again in wide format.
library(tidyverse)
df %>%
mutate(ind = row_number()) %>%
gather(key, value, -(c(ind, ID:height))) %>%
group_by(ind) %>%
mutate(value = replace(value, duplicated(value), NA)) %>%
spread(key, value) %>%
ungroup() %>%
select(-ind)
# ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
# <int> <fct> <dbl> <int> <int> <int> <int>
#1 12 m 1.81 1223 NA NA NA
#2 13 f 1.65 5664 4667 NA NA
#3 15 m 1.78 6663 NA NA NA
Upvotes: 1
Reputation: 51592
Another idea for fun avoiding the apply(..., margin = 1, ..)
library(tidyverse)
stack(df[-c(1:3)]) %>%
mutate(values = replace(values, duplicated(values), NA)) %>%
unstack() %>%
bind_cols(df[c(1:3)], .)
which gives,
ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4 1 12 m 1.81 1223 NA NA NA 2 13 f 1.65 5664 4667 NA NA 3 15 m 1.78 6663 NA NA NA
Upvotes: 1