Reputation: 335
I need to do some set operations using only data.table library.
I don't want to join/merge, but set operations. So we do not match column by key, but add rows from different data.tables.
We work on two samples from built-in data sets.
All of the problematic questions are below:
#install.packages("nycflights13")
library(nycflights13)
#install.packages("sqldf")
library(sqldf)
#install.packages("data.table")
library(data.table)
dt_airports <- data.table(airports)
Adt <- dt_airports[1:10]
Bdt <- dt_airports[6:15]
sqldf('SELECT * FROM A INTERSECT SELECT * FROM B')
# can I do it better using data.table?
rbindlist(list(Adt,Bdt))[duplicated(rbindlist(list(Adt,Bdt))),]
sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')
# how to do it using data.table?
sqldf('SELECT * FROM B EXCEPT SELECT * FROM A')
# how to do it using data.table?
Upvotes: 2
Views: 549
Reputation: 371
To elaborate on Frank's comment, data.table v1.9.8
and above has built-in rows set operation. I would prefer fsetdiff
or fintersect
solution as there is no need to set keys.
#first problem
fintersect(Adt,Bdt)
#second problem
fsetdiff(Adt,Bdt)
#third problem
fsetdiff(Bdt,Adt)
It was also claimed to be fast: https://github.com/Rdatatable/data.table/blob/master/NEWS.0.md
Upvotes: 2
Reputation: 335
Using only data.table:
sqldf('SELECT * FROM A INTERSECT SELECT * FROM B')
# data.table
rbindlist(list(Adt,Bdt))[duplicated(rbindlist(list(Adt,Bdt))),]
# better in data.table
setkeyv(Adt,colnames(Adt)[1:7])
Adt[Bdt, nomatch=0L]
sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')
# data.table
setkeyv(Adt,colnames(Adt)[1:7])
Adt[!Bdt]
sqldf('SELECT * FROM B EXCEPT SELECT * FROM A')
# data.table
setkeyv(Bdt,colnames(Bdt)[1:7])
Bdt[!Adt]
Upvotes: 1