lukehawk
lukehawk

Reputation: 1493

How do I exclude columns from a data.table?

I have a data.table, and want to exclude some set of columns. For example,

library(data.table)
dt <- data.table(a = 1:2, b = 2:3, c = 3:4, d = 4:5)
dt[ , .(b, c)]

Gives me the second and third column, b and c. How do I instead EXCLUDE columns b and c. Coming from the data.frame world, I would expect something like the following:

dt[ , -.(b, c)]

or, maybe

dt[ , !.(b, c)]

But neithr of these work. I know I can use

dt[ , -c(2:3), with = FALSE]

but this just (as I understand it) casts the data.table as a data.frame and then uses the standard operations. I would like to avoid this, since it is a) kind of cheating, an b) gives up the speed boosts available in data.table. I reviewed the data.table FAQ, and the vignette, and cannot seem to find anything.

(I know this is all very simplistic, and I could just select the other two columns. However, this is a microcosm of a much, MUCH bigger data.table I am working with.)

Upvotes: 25

Views: 32745

Answers (6)

cach dies
cach dies

Reputation: 341

you can always just do:

dt[ , -c("b", "c")]

although this uses the data.fame sintax and as the problems you describe, particularly it seems to be much slower on large data sets.

Upvotes: 4

Shrivathsa
Shrivathsa

Reputation: 101

I am using R and the data.table package. I am trying to add the index number of the columns you wish to EXCLUDE from your data.table object within c function, prefixed with a minus sign "-".

With respect to the sample of code you have shared,

    dt <- dt[,c(-<index of column "a">, -<index of column "b">)]

Note: " index of column "a" " and " index of column "b" " and the angular brackets < and > should be ignored, it should be replaced by the index number of the column you wish to exclude.

Personally, I would not recommend you, to use the index of the columns for deselecting columns as it is not a good practice, this was told by the co-author of data.table package, Arunkumar Srinivasan,in a DataCamp course on data.table.

    dt <- dt[,-c(<"name of column to be deselected">, <"name of column deselected">)]

Note: name of column to be deselected and the angular brackets < and > should be ignored, it should be replaced by the name number of the column you wish to exclude.

Upvotes: 1

ira
ira

Reputation: 2644

Also, in case you would not wish to change the data.table, but merely return the columns except some columns, you can do:

dt[,.SD, .SDcols = !c('b', 'c')]

which returns the required result of:

   a d
1: 1 4
2: 2 5

while dt remains unchanged:

> dt
   a b c d
1: 1 2 3 4
2: 2 3 4 5

Upvotes: 29

Deb
Deb

Reputation: 539

Another way using set:

set(dt,, c("b", "c"), NULL)

Upvotes: 1

akrun
akrun

Reputation: 887221

We can use setdiff

dt[, setdiff(names(dt), c("b", "c")), with = FALSE]

or we can assign to NULL (as in the other answer) but in a single step

dt[, c("b", "c") := NULL][]

Upvotes: 10

rafa.pereira
rafa.pereira

Reputation: 13817

You can do:

  dt[ , b := NULL][ , c := NULL]

or you can use a list of columns to be removed:

xx <- c("b","c") # vector of columns you DON'T want

# subset
  dt <- dt[, !xx, with = FALSE]

Upvotes: 5

Related Questions