Reputation: 37
I have a large dataset with several rows that include multiple customerIDs as follows.
DT
|customerID | V1| V2 |
|:---------:|:---:|:------:|
|A| 24| RT|
|A| 56| ES|
|B| 42| GS|
|B| 43| XC|
|B| 46| XZ|
|C| 42| GE|
|C| 25| WD|
|D| 34| XZ|
|D| 19| RF|
|E| 19| DW|
|E| 37| XS|
|F| 44| ZS|
|G| 23| AA|
I want to delete rows with certain customerIDs (e.g. A, C, E) so that it looks like this:
DT
|customerID | V1| V2 |
|:---------:|:---:|:------:|
|B| 42| GS|
|B| 43| XC|
|B| 46| XZ|
|D| 34| XZ|
|D| 19| RF|
|F| 44| ZS|
|G| 23| AA|
I know one way to do it is to use the following method:
dt1 <- dt[customerID != A & customerID != C & customerID != E]
However, because my actual dataset contains several customerIDs, I want to find a more efficient way instead.
My goal was to create a vector with the customerIDs I want to exclude as follows:
exclude <- c(A, C, E)
Then use the vector to conditionally remove the customerIDs:
dt1 <- dt[customerID != exclude]
However, when I run this in R, nothing in the table changes.
Any assistance would be appreciated.
Upvotes: 1
Views: 434
Reputation: 887118
We may use %chin%
(or %in%
) with negate (!
)
library(data.table)
exclude <- c("A", "C", "E")
dt[!customerID %chin% exclude]
-output
customerID V1 V2
<char> <int> <char>
1: B 42 GS
2: B 43 XC
3: B 46 XZ
4: D 34 XZ
5: D 19 RF
6: F 44 ZS
7: G 23 AA
==
or !=
are elementwise operators which works best when the length of the lhs/rhs are the same or the rhs value is of length 1 (which recycles) or else the recycling will check on rows that gives undesriable results i.e. i.e. first element of 'exclude' will compare to first element of customerID, 2nd element to 2nd element,..., 1st element again to 3rd element of customerID and so on..
dt <- structure(list(customerID = c("A", "A", "B", "B", "B", "C", "C",
"D", "D", "E", "E", "F", "G"), V1 = c(24L, 56L, 42L, 43L, 46L,
42L, 25L, 34L, 19L, 19L, 37L, 44L, 23L), V2 = c("RT", "ES", "GS",
"XC", "XZ", "GE", "WD", "XZ", "RF", "DW", "XS", "ZS", "AA")),
class = c("data.table",
"data.frame"), row.names = c(NA, -13L))
Upvotes: 4