Reputation: 4421
Let's say, we have a simple data frame like
df <-read.table(text="
colA colB colC colD
1 2 3 4
5 6 7 8
",header=TRUE,sep="")
It has often been explained that one can store the names of columns to be kept in a vector itself:
rows_to_select <- c("colA", "colB")
Subsetting with subset(df, select=rows_to_select)
yields the expected outcome.
But why can't I simply invert the keep-sign by putting a minus in front, i.e. subset(df, select=-rows_to_select)
? It gives the error Error in -keep : invalid argument to unary operator Calls: subset -> subset.data.frame -> eval -> eval
.
However, subset(df, select=-c(colA, colB))
works. Do I always have to employ setdiff, e.g. keep <- setdiff(names(df), rows_to_select)
so that I can subset(df, select=keep)
?
Upvotes: 2
Views: 1576
Reputation: 23574
The dplyr
package offers your way of subsetting data.
v1 <- 1:10
v2 <- 11:20
v3 <- rep(c("ana", "bob"), each = 5)
v4 <- letters[1:10]
foo <- data.frame(v1,v2,v3, v4, stringsAsFactors=F)
# Remove column v2 and v3
select(foo, -c(v2:v3))
# v1 v4
#1 1 a
#2 2 b
#3 3 c
#4 4 d
#5 5 e
#6 6 f
#7 7 g
#8 8 h
#9 9 i
#10 10 j
Upvotes: 1
Reputation: 99331
You won't be able to use a minus sign with a character vector. But you can use one with a numeric index vector. Furthermore, you'd be better-off using [
-type subsetting.
To get an index, we can use which
.
> rows <- c("colA", "colB")
> df[, -which(names(df) %in% rows)]
# colC colD
# 1 3 4
# 2 7 8
Upvotes: 2
Reputation: 18602
There are several different ways you could accomplish this, and you are not limited to just the subset
function. For example,
Df <- data.frame(
colA=1:4,
colB=5:8,
colC=9:12,
colD=13:16)
##
rows_to_select <- c("colA", "colB")
##
> Df[,!(names(Df) %in% rows_to_select)]
colC colD
1 9 13
2 10 14
3 11 15
4 12 16
Subsetting data.frame
s using [
is also more efficient than calling subset()
. But to address your question of
why can't I simply invert the keep-sign by putting a minus in front
that is just a result of R's language structure.
Upvotes: 0