Reputation: 31
I'm trying to flag duplicate IDs in another column. I don't necessarily want to remove them yet, just create an indicator (0/1) of whether the IDs are unique or duplicates. In sql, it would be like this:
SELECT ID
, count(ID
) count from TABLE
group by ID
) a
On TABLE
.ID
= a.ID
set ID Duplicate Flag Column 1
= 1
where count > 1;
Is there a way to do this simply in r? Any help would be greatly appreciated.
Upvotes: 2
Views: 4561
Reputation: 6784
As an example of duplicated
let's start with some values (numbers here, but strings would do the same thing)
x <- c(9, 1:5, 3:7, 0:8)
x
# 9 1 2 3 4 5 3 4 5 6 7 0 1 2 3 4 5 6 7 8
If you want to flag the second and later copies
as.numeric(duplicated(x))
# 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0
If you want to flag all values that occur two or more times
as.numeric(x %in% x[duplicated(x)])
# 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
Upvotes: 1