AgnieszkaTomczyk
AgnieszkaTomczyk

Reputation: 335

Set operations using data.table

I need to do some set operations using only data.table library.

I don't want to join/merge, but set operations. So we do not match column by key, but add rows from different data.tables.

We work on two samples from built-in data sets.

All of the problematic questions are below:

#install.packages("nycflights13")
library(nycflights13)

#install.packages("sqldf")
library(sqldf)

#install.packages("data.table")
library(data.table)

dt_airports <- data.table(airports)

Adt <- dt_airports[1:10]
Bdt <- dt_airports[6:15]

sqldf('SELECT * FROM A INTERSECT SELECT * FROM B')
# can I do it better using data.table?
rbindlist(list(Adt,Bdt))[duplicated(rbindlist(list(Adt,Bdt))),]

sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')
# how to do it using data.table?

sqldf('SELECT * FROM B EXCEPT SELECT * FROM A')
# how to do it using data.table?

Upvotes: 2

Views: 549

Answers (2)

Deen24ID
Deen24ID

Reputation: 371

To elaborate on Frank's comment, data.table v1.9.8 and above has built-in rows set operation. I would prefer fsetdiff or fintersect solution as there is no need to set keys.

#first problem
fintersect(Adt,Bdt)

#second problem
fsetdiff(Adt,Bdt)

#third problem
fsetdiff(Bdt,Adt)

It was also claimed to be fast: https://github.com/Rdatatable/data.table/blob/master/NEWS.0.md

Upvotes: 2

AgnieszkaTomczyk
AgnieszkaTomczyk

Reputation: 335

Using only data.table:

sqldf('SELECT * FROM A INTERSECT SELECT * FROM B')
# data.table
rbindlist(list(Adt,Bdt))[duplicated(rbindlist(list(Adt,Bdt))),]
# better in data.table
setkeyv(Adt,colnames(Adt)[1:7])
Adt[Bdt, nomatch=0L]

sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')
# data.table
setkeyv(Adt,colnames(Adt)[1:7])
Adt[!Bdt]

sqldf('SELECT * FROM B EXCEPT SELECT * FROM A')
# data.table
setkeyv(Bdt,colnames(Bdt)[1:7])
Bdt[!Adt]

Upvotes: 1

Related Questions