MERose
MERose

Reputation: 4421

Subset with vector specifying columns to drop

Let's say, we have a simple data frame like

df <-read.table(text="
colA colB colC colD
1    2    3    4
5    6    7    8
",header=TRUE,sep="")

It has often been explained that one can store the names of columns to be kept in a vector itself:

rows_to_select <- c("colA", "colB")

Subsetting with subset(df, select=rows_to_select) yields the expected outcome.

But why can't I simply invert the keep-sign by putting a minus in front, i.e. subset(df, select=-rows_to_select)? It gives the error Error in -keep : invalid argument to unary operator Calls: subset -> subset.data.frame -> eval -> eval.

However, subset(df, select=-c(colA, colB)) works. Do I always have to employ setdiff, e.g. keep <- setdiff(names(df), rows_to_select) so that I can subset(df, select=keep)?

Upvotes: 2

Views: 1576

Answers (3)

jazzurro
jazzurro

Reputation: 23574

The dplyr package offers your way of subsetting data.

v1 <- 1:10
v2 <- 11:20
v3 <- rep(c("ana", "bob"), each = 5)
v4 <- letters[1:10]

foo <- data.frame(v1,v2,v3, v4, stringsAsFactors=F)

# Remove column v2 and v3
select(foo, -c(v2:v3))

#   v1 v4
#1   1  a
#2   2  b
#3   3  c
#4   4  d
#5   5  e
#6   6  f
#7   7  g
#8   8  h
#9   9  i
#10 10  j

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99331

You won't be able to use a minus sign with a character vector. But you can use one with a numeric index vector. Furthermore, you'd be better-off using [-type subsetting.

To get an index, we can use which.

> rows <- c("colA", "colB")
> df[, -which(names(df) %in% rows)]
#   colC colD
# 1    3    4
# 2    7    8

Upvotes: 2

nrussell
nrussell

Reputation: 18602

There are several different ways you could accomplish this, and you are not limited to just the subset function. For example,

Df <- data.frame(
  colA=1:4,
  colB=5:8,
  colC=9:12,
  colD=13:16)
##
rows_to_select <- c("colA", "colB")
##
> Df[,!(names(Df) %in% rows_to_select)]
  colC colD
1    9   13
2   10   14
3   11   15
4   12   16

Subsetting data.frames using [ is also more efficient than calling subset(). But to address your question of

why can't I simply invert the keep-sign by putting a minus in front

that is just a result of R's language structure.

Upvotes: 0

Related Questions