user1631306
user1631306

Reputation: 4470

apply function in data table for conditional removal of row

I have a data table, dt:

           V1                      V2             V3   PubMedCounts
1: 0000100005                100-00-5     CAS Number              6
2: 0000100005 1-Chloro-4-nitrobenzene DescriptorName             12
3: 0000100005                    aahs DescriptorName            111
4: 0000100005                    PNCB        Synonym             35

Also, I have a data table, ew, which has only one columns with words, like:

          V1
1:       aah
2:     aahed
3:    aahing
4:      aahs
5:  aardvark

from dt data table, i need to remove all the rows which have V2 size less than or equal to 5 or present in ew data table.

Example, from dt table, i would remove 3rd and 4th row.

I would like to use apply function to make it efficient as its pretty big data set

Upvotes: 0

Views: 123

Answers (1)

MichaelChirico
MichaelChirico

Reputation: 34733

If I understand you correctly I would do:

dt[!ew, on = c(V2 = "V1")][nchar(V2) > 5]

which gives:

       V1                      V2             V3 PubMedCounts
1: 100005                100-00-5     CAS_Number            6
2: 100005 1-Chloro-4-nitrobenzene DescriptorName           12

Applying the conditions in the other order might be faster:

dt[nchar(V2) > 5][!ew, on = c(V2 = "V1")]

This prevents matching on things in dt that would be deleted in the next step anyway.

A third possibility is using:

dt[nchar(V2) > 5 & !( V2 %chin% ew$V1 )]

Used data:

dt <- structure(list(V1 = c(100005L, 100005L, 100005L, 100005L), V2 = c("100-00-5", 
"1-Chloro-4-nitrobenzene", "aahs", "PNCB"), V3 = c("CAS_Number", 
"DescriptorName", "DescriptorName", "Synonym"), PubMedCounts = c(6L, 
12L, 111L, 35L)), .Names = c("V1", "V2", "V3", "PubMedCounts"
), row.names = c(NA, -4L), class = c("data.table", "data.frame"))

ew <- structure(list(V1 = c("aah", "aahed", "aahing", "aahs", "aardvark")), .Names = "V1", row.names = c(NA, -5L), class = c("data.table", "data.frame"))

Upvotes: 2

Related Questions