Reputation: 39585
Hi dear I have a little problem with a dataframe that has duplicates in a column. I would like to remove the rows where a column presents duplicates. For example my dataframe is like this:
Value City Card.Type ID
100 Michigan Silver 001
120 Angeles Gold 002
NA Kansas Gold 002
500 Michigan Silver 001
800 Texas Basic 005
You can see that in ID
column there are two duplicates one for 001
and one for 002
. I was using unique
function but I don't get to erase that duplicates. I would like to get someone like this:
Value City Card.Type ID
100 Michigan Silver 001
120 Angeles Gold 002
800 Texas Basic 005
Thanks for your help.
Upvotes: 1
Views: 3060
Reputation: 263301
The use of which should only be done with its "positive" version. The danger in using the construction -which() is that when none of the rows or items match the test, the result of the which()
is numeric(0)
and -numeric(0)
will return 'nothing', when the correct result is 'everything'. Use use:
dat[!duplicated(dat), ]
In this case there were no duplicated rows, but the OP thought that some should be removed so obviously it was only two or three columns were under consideration. This is easy to accommodate. Just do the duplication test on 2 or three columns:
dat[ !duplicated(dat[ , 2:3] ) , ]
Upvotes: 4
Reputation: 7774
Use the function duplicated
.
Something like:
data.subset <- data[!duplicated(data$ID),]
Duplicated returns a true/false vector. The second duplicated entry in the vector will always return TRUE
.
Upvotes: 3