munale
munale

Reputation: 27

trying to subset a data table in R by removing items that are in a 2nd table

I have two data frames (from a csv file) in R as such:

df1 <- data.frame(V1 = 1:9, V2 = LETTERS[1:9])
df2 <- data.frame(V1 = 1:3, V2 = LETTERS[1:3])

I convert both to data.table as follows:

dt1 <- data.table(df1, key="V1")
dt2 <- data.table(df2, key="V1")

I want to now return a table that looks like dt1 but without any rows where the key is found in dt2. So in this instance I would like to get back:

4 D
5 E
... 
9 I

I'm using the following code in R:

dt3 <- dt1[!dt2$V1]

this works on this example, however when I try it for a large data set (100k) it does not work. It only removes 2 rows, and I know it should be a lot more than that. is there a limit to this type of operation or something else I havent considered?

Upvotes: 0

Views: 64

Answers (3)

Frank
Frank

Reputation: 66819

Because the tables are keyed, you can do this with a "not-join"

dt1 <- data.table(rep(1:3,2), LETTERS[1:6], key="V1")
#    V1 V2
# 1:  1  A
# 2:  1  D
# 3:  2  B
# 4:  2  E
# 5:  3  C
# 6:  3  F
dt2 <- data.table(1:2, letters[1:2], key="V1")
#    V1 V2
# 1:  1  a
# 2:  2  b

dt1[!.(dt2$V1)]
#    V1 V2
# 1:  3  C
# 2:  3  F

According to the documentation, . or J should not be necessary, since the ! alone is enough:

All types of i may be prefixed with !. This signals a not-join or not-select should be performed.

However, the OP's code does not work:

dt1[!(dt2$V1)]
#    V1 V2
# 1:  2  B
# 2:  2  E
# 3:  3  C
# 4:  3  F

In this case, dt2$V1 is read as a vector of row numbers, not as part of a join. Looks like this is what is meant by a "not-select", but I think it could be more explicit. When I read the sentence above, for all I know "not-select" and "not-join" are two terms for the same thing.

Upvotes: 1

Zachary
Zachary

Reputation: 319

Drop the column name "V1" to do a not-join. The tables are already keyed by V1.

dt3 <- dt1[!dt2]

Upvotes: 1

jeremycg
jeremycg

Reputation: 24955

You could try:

dt1[!(dt1$V1 %in% dt2$V1)]

This assumes that you don't care about ordering.

Upvotes: 0

Related Questions