Reputation: 27
I have two data frames (from a csv file) in R as such:
df1 <- data.frame(V1 = 1:9, V2 = LETTERS[1:9])
df2 <- data.frame(V1 = 1:3, V2 = LETTERS[1:3])
I convert both to data.table
as follows:
dt1 <- data.table(df1, key="V1")
dt2 <- data.table(df2, key="V1")
I want to now return a table that looks like dt1
but without any rows where the key is found in dt2
. So in this instance I would like to get back:
4 D
5 E
...
9 I
I'm using the following code in R:
dt3 <- dt1[!dt2$V1]
this works on this example, however when I try it for a large data set (100k) it does not work. It only removes 2 rows, and I know it should be a lot more than that. is there a limit to this type of operation or something else I havent considered?
Upvotes: 0
Views: 64
Reputation: 66819
Because the tables are keyed, you can do this with a "not-join"
dt1 <- data.table(rep(1:3,2), LETTERS[1:6], key="V1")
# V1 V2
# 1: 1 A
# 2: 1 D
# 3: 2 B
# 4: 2 E
# 5: 3 C
# 6: 3 F
dt2 <- data.table(1:2, letters[1:2], key="V1")
# V1 V2
# 1: 1 a
# 2: 2 b
dt1[!.(dt2$V1)]
# V1 V2
# 1: 3 C
# 2: 3 F
According to the documentation, .
or J
should not be necessary, since the !
alone is enough:
All types of
i
may be prefixed with!
. This signals a not-join or not-select should be performed.
However, the OP's code does not work:
dt1[!(dt2$V1)]
# V1 V2
# 1: 2 B
# 2: 2 E
# 3: 3 C
# 4: 3 F
In this case, dt2$V1
is read as a vector of row numbers, not as part of a join. Looks like this is what is meant by a "not-select", but I think it could be more explicit. When I read the sentence above, for all I know "not-select" and "not-join" are two terms for the same thing.
Upvotes: 1
Reputation: 319
Drop the column name "V1" to do a not-join. The tables are already keyed by V1.
dt3 <- dt1[!dt2]
Upvotes: 1
Reputation: 24955
You could try:
dt1[!(dt1$V1 %in% dt2$V1)]
This assumes that you don't care about ordering.
Upvotes: 0