Duck
Duck

Reputation: 39585

How to remove rows in a dataframe considering there are duplicates in one column of dataframe

Hi dear I have a little problem with a dataframe that has duplicates in a column. I would like to remove the rows where a column presents duplicates. For example my dataframe is like this:

Value   City    Card.Type   ID
100   Michigan    Silver    001
120   Angeles     Gold      002
NA    Kansas      Gold      002
500   Michigan    Silver    001
800   Texas       Basic     005

You can see that in ID column there are two duplicates one for 001 and one for 002. I was using unique function but I don't get to erase that duplicates. I would like to get someone like this:

 Value   City    Card.Type   ID
 100   Michigan    Silver    001
 120   Angeles     Gold      002
 800   Texas       Basic     005

Thanks for your help.

Upvotes: 1

Views: 3060

Answers (2)

IRTFM
IRTFM

Reputation: 263301

The use of which should only be done with its "positive" version. The danger in using the construction -which() is that when none of the rows or items match the test, the result of the which() is numeric(0) and -numeric(0) will return 'nothing', when the correct result is 'everything'. Use use:

 dat[!duplicated(dat), ]  

In this case there were no duplicated rows, but the OP thought that some should be removed so obviously it was only two or three columns were under consideration. This is easy to accommodate. Just do the duplication test on 2 or three columns:

 dat[ !duplicated(dat[ , 2:3] ) , ]

Upvotes: 4

dayne
dayne

Reputation: 7774

Use the function duplicated.

Something like:

data.subset <- data[!duplicated(data$ID),]

Duplicated returns a true/false vector. The second duplicated entry in the vector will always return TRUE.

Upvotes: 3

Related Questions