Reputation: 141

Remove duplicated rows (based on 2 columns) in R

I have a dataset in R which looks like this:

    x1 x2  x3
1:  A Away  2
2:  A Home  2
3:  B Away  2
4:  B Away  1
5:  B Home  2
6:  B Home  1
7:  C Away  1
8:  C Home  1

Based on the values in columns x1 and x2, I want to remove the duplicate rows. I have tried the following:

df[!duplicated(df[,c('x1', 'x2')]),]

It should remove rows 4 and 6. But unfortunately it is not working, as it returns exactly the same data, with the duplicates still present in the dataset. What do I have to use in order to remove rows 4 and 6?

Upvotes: 9

Answers (3)

Sathish

Reputation: 12723

library("data.table")
setDT(df)[, .SD[1], by = .(x1, x2)]

#     x1   x2 x3
# 1:  A Away  2
# 2:  A Home  2
# 3:  B Away  2
# 4:  B Home  2
# 5:  C Away  1
# 6:  C Home  1

Upvotes: 3

Arun

Reputation: 118879

I'd just do:

unique(df, by=c("x1", "x2")) # where df is a data.table

This'd have been quite obvious if you'd just looked at ?unique.

PS: given the syntax in your Q, I wonder if you are aware of the basic differences between data.table and data.frame's syntax. I suggest you read the vignettes first.

Upvotes: 7

ArunK

Reputation: 1781

or you can use dplyr library

library("dplyr")
df <- data.frame(x1 = c("A","A","B","B","B","B","C","C"), x2 = c("Away","Home","Away","Away","Home","Home","Away","Home"), x3 = c(2,2,2,1,2,1,1,1))

distinct(df,x1,x2,.keep_all = TRUE)
#      x1   x2 x3
#    1  A Away  2
#    2  A Home  2
#    3  B Away  2
#    4  B Home  2
#    5  C Away  1
#    6  C Home  1

Upvotes: 2

Remove duplicated rows (based on 2 columns) in R

Answers (3)

Related Questions