Reputation: 25580
I have a vector with repeated elements, and would like to remove them so that each element appears only once.
In Python I could construct a Set
from a vector to achieve this, but how can I do this in R?
Upvotes: 55
Views: 82254
Reputation: 4430
setdiff(A,B)
automatically drop duplicates from both A and B, you can pass NA
as second argument:
> v = c(1, 1, 5, 5, 2, 2, 6, 6, 1, 3)
> setdiff(v,NA)
[1] 1 5 2 6 3
Upvotes: 0
Reputation: 11419
To remove contiguous duplicated elements only, you can compare the vector with a shifted version of itself:
v <- c(1, 1, 5, 5, 5, 5, 2, 2, 6, 6, 1, 3, 3)
v[c(TRUE, !v[-length(v)] == v[-1])]
[1] 1 5 2 6 1 3
The same can be written a little more elegantly using dplyr:
library(dplyr)
v[v != lag(v)]
[1] NA 5 2 6 1 3
The NA returned by lag() removes the first value, to keep the first value, you can change the default to a value that will be different from the first value.
v[v != lag(v, default = !v[1])]
[1] 1 5 2 6 1 3
Upvotes: 7
Reputation: 1154
You can check out unique
function.
> v = c(1, 1, 5, 5, 2, 2, 6, 6, 1, 3)
> unique(v)
[1] 1 5 2 6 3
Upvotes: 88
Reputation: 5274
This does the same thing. Slower, but useful if you also want a logical vector of the duplicates:
v[duplicated(v)]
Upvotes: 11