Reputation: 1493
I have a data.table, and want to exclude some set of columns. For example,
library(data.table)
dt <- data.table(a = 1:2, b = 2:3, c = 3:4, d = 4:5)
dt[ , .(b, c)]
Gives me the second and third column, b and c. How do I instead EXCLUDE columns b and c. Coming from the data.frame world, I would expect something like the following:
dt[ , -.(b, c)]
or, maybe
dt[ , !.(b, c)]
But neithr of these work. I know I can use
dt[ , -c(2:3), with = FALSE]
but this just (as I understand it) casts the data.table as a data.frame and then uses the standard operations. I would like to avoid this, since it is a) kind of cheating, an b) gives up the speed boosts available in data.table. I reviewed the data.table FAQ, and the vignette, and cannot seem to find anything.
(I know this is all very simplistic, and I could just select the other two columns. However, this is a microcosm of a much, MUCH bigger data.table I am working with.)
Upvotes: 25
Views: 32745
Reputation: 341
you can always just do:
dt[ , -c("b", "c")]
although this uses the data.fame sintax and as the problems you describe, particularly it seems to be much slower on large data sets.
Upvotes: 4
Reputation: 101
I am using R and the data.table package. I am trying to add the index number of the columns you wish to EXCLUDE from your data.table object within c function, prefixed with a minus sign "-".
With respect to the sample of code you have shared,
dt <- dt[,c(-<index of column "a">, -<index of column "b">)]
Note: " index of column "a" " and " index of column "b" " and the angular brackets < and > should be ignored, it should be replaced by the index number of the column you wish to exclude.
Personally, I would not recommend you, to use the index of the columns for deselecting columns as it is not a good practice, this was told by the co-author of data.table package, Arunkumar Srinivasan,in a DataCamp course on data.table.
dt <- dt[,-c(<"name of column to be deselected">, <"name of column deselected">)]
Note: name of column to be deselected and the angular brackets < and > should be ignored, it should be replaced by the name number of the column you wish to exclude.
Upvotes: 1
Reputation: 2644
Also, in case you would not wish to change the data.table, but merely return the columns except some columns, you can do:
dt[,.SD, .SDcols = !c('b', 'c')]
which returns the required result of:
a d
1: 1 4
2: 2 5
while dt remains unchanged:
> dt
a b c d
1: 1 2 3 4
2: 2 3 4 5
Upvotes: 29
Reputation: 887221
We can use setdiff
dt[, setdiff(names(dt), c("b", "c")), with = FALSE]
or we can assign to NULL
(as in the other answer) but in a single step
dt[, c("b", "c") := NULL][]
Upvotes: 10
Reputation: 13817
You can do:
dt[ , b := NULL][ , c := NULL]
or you can use a list of columns to be removed:
xx <- c("b","c") # vector of columns you DON'T want
# subset
dt <- dt[, !xx, with = FALSE]
Upvotes: 5