Reputation: 141
I have a dataset in R which looks like this:
x1 x2 x3
1: A Away 2
2: A Home 2
3: B Away 2
4: B Away 1
5: B Home 2
6: B Home 1
7: C Away 1
8: C Home 1
Based on the values in columns x1 and x2, I want to remove the duplicate rows. I have tried the following:
df[!duplicated(df[,c('x1', 'x2')]),]
It should remove rows 4 and 6. But unfortunately it is not working, as it returns exactly the same data, with the duplicates still present in the dataset. What do I have to use in order to remove rows 4 and 6?
Upvotes: 9
Views: 3322
Reputation: 12723
library("data.table")
setDT(df)[, .SD[1], by = .(x1, x2)]
# x1 x2 x3
# 1: A Away 2
# 2: A Home 2
# 3: B Away 2
# 4: B Home 2
# 5: C Away 1
# 6: C Home 1
Upvotes: 3
Reputation: 118879
I'd just do:
unique(df, by=c("x1", "x2")) # where df is a data.table
This'd have been quite obvious if you'd just looked at ?unique
.
PS: given the syntax in your Q, I wonder if you are aware of the basic differences between data.table and data.frame's syntax. I suggest you read the vignettes first.
Upvotes: 7
Reputation: 1781
or you can use dplyr
library
library("dplyr")
df <- data.frame(x1 = c("A","A","B","B","B","B","C","C"), x2 = c("Away","Home","Away","Away","Home","Home","Away","Home"), x3 = c(2,2,2,1,2,1,1,1))
distinct(df,x1,x2,.keep_all = TRUE)
# x1 x2 x3
# 1 A Away 2
# 2 A Home 2
# 3 B Away 2
# 4 B Home 2
# 5 C Away 1
# 6 C Home 1
Upvotes: 2