Reputation: 213
I have data that looks like this
ID v1 v2
1 1 0
2 0 1
3 1 0
3 0 1
4 0 1
I want to replace all values with 'NA' if the ID occurs more than once in the dataframe. The final product should look like this
ID v1 v2
1 1 0
2 0 1
3 NA NA
3 NA NA
4 0 1
I could do this by hand, but I want R to detect all the duplicate cases (in this case two times ID '3') and replace the values with 'NA'.
Thanks for your help!
Upvotes: 0
Views: 337
Reputation: 23788
One more option:
df1[df1$ID %in% df1$ID[duplicated(df1$ID)], -1] <- NA
#> df1
# ID v1 v2
#1 1 1 0
#2 2 0 1
#3 3 NA NA
#4 3 NA NA
#5 4 0 1
data
df1 <- structure(list(ID = c(1L, 2L, 3L, 3L, 4L), v1 = c(1L, 0L, 1L,
0L, 0L), v2 = c(0L, 1L, 0L, 1L, 1L)), .Names = c("ID", "v1",
"v2"), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 3
Reputation: 99331
You could use duplicated()
from either end, and then replace.
idx <- duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE)
df[idx, -1] <- NA
which gives
ID v1 v2 1 1 1 0 2 2 0 1 3 3 NA NA 4 3 NA NA 5 4 0 1
This will also work if the duplicated IDs are not next to each other.
Data:
df <- structure(list(ID = c(1L, 2L, 3L, 3L, 4L), v1 = c(1L, 0L, 1L,
0L, 0L), v2 = c(0L, 1L, 0L, 1L, 1L)), .Names = c("ID", "v1",
"v2"), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 4
Reputation: 38500
Here is a base R method
# get list of repeated IDs
repeats <- rle(df$ID)$values[rle(df$ID)$lengths > 1]
# set the corresponding variables to NA
df[, -1] <- sapply(df[, -1], function(i) {i[df$ID %in% repeats] <- NA; i})
In the first line, we use rle
to extract repeated IDs. In the second, we use sapply
to loop through non-ID variables and replace IDs that repeat with NA for each variable.
Note that this assumes that the data set is sorted by ID. This may be accomplished with the order
function. (df <- df[order(df$ID),]
).
If the dataset is very large, you might break up the first function into two steps to avoid computing the rle
twice:
dfRle <- rle(df$ID)
repeats <- dfRle$values[dfRle$lengths > 1]
data
df <- read.table(header=T, text="ID v1 v2
1 1 0
2 0 1
3 1 0
3 0 1
4 0 1")
Upvotes: 0