JSON7555
JSON7555

Reputation: 37

Delete rows conditionally in data table

I have a large dataset with several rows that include multiple customerIDs as follows.

DT

|customerID | V1| V2 | 
|:---------:|:---:|:------:|
|A| 24| RT|
|A| 56| ES|
|B| 42| GS|
|B| 43| XC|
|B| 46| XZ|
|C| 42| GE|
|C| 25| WD|
|D| 34| XZ|
|D| 19| RF|
|E| 19| DW|
|E| 37| XS|
|F| 44| ZS|
|G| 23| AA|

I want to delete rows with certain customerIDs (e.g. A, C, E) so that it looks like this:

DT

|customerID | V1| V2 | 
|:---------:|:---:|:------:|
|B| 42| GS|
|B| 43| XC|
|B| 46| XZ|
|D| 34| XZ|
|D| 19| RF|
|F| 44| ZS|
|G| 23| AA|

I know one way to do it is to use the following method:

dt1 <- dt[customerID != A & customerID != C & customerID != E]

However, because my actual dataset contains several customerIDs, I want to find a more efficient way instead.

My goal was to create a vector with the customerIDs I want to exclude as follows:

exclude <- c(A, C, E)

Then use the vector to conditionally remove the customerIDs:

dt1 <- dt[customerID != exclude]

However, when I run this in R, nothing in the table changes.

Any assistance would be appreciated.

Upvotes: 1

Views: 434

Answers (1)

akrun
akrun

Reputation: 887118

We may use %chin% (or %in%) with negate (!)

library(data.table)
exclude <- c("A", "C", "E")
dt[!customerID %chin% exclude]

-output

  customerID    V1     V2
       <char> <int> <char>
1:          B    42     GS
2:          B    43     XC
3:          B    46     XZ
4:          D    34     XZ
5:          D    19     RF
6:          F    44     ZS
7:          G    23     AA

== or != are elementwise operators which works best when the length of the lhs/rhs are the same or the rhs value is of length 1 (which recycles) or else the recycling will check on rows that gives undesriable results i.e. i.e. first element of 'exclude' will compare to first element of customerID, 2nd element to 2nd element,..., 1st element again to 3rd element of customerID and so on..

data

dt <- structure(list(customerID = c("A", "A", "B", "B", "B", "C", "C", 
"D", "D", "E", "E", "F", "G"), V1 = c(24L, 56L, 42L, 43L, 46L, 
42L, 25L, 34L, 19L, 19L, 37L, 44L, 23L), V2 = c("RT", "ES", "GS", 
"XC", "XZ", "GE", "WD", "XZ", "RF", "DW", "XS", "ZS", "AA")),
 class = c("data.table", 
"data.frame"), row.names = c(NA, -13L))

Upvotes: 4

Related Questions