Brian
Brian

Reputation: 4243

Redefine Data Frame in R

database$VAR which has values of 0's and 1's.

How can I redefine the data frame so that the 1's are removed?

Thanks!

Upvotes: 0

Views: 926

Answers (3)

Chase
Chase

Reputation: 69161

TMTOWTDI

Using subset:

df.new <- subset(df, VAR == 0)

EDIT:

David's solution seems to be the fastest on my machine. Subset seems to be the slowest. I won't even pretend to try and understand what's going on under that accounts for these differences:

> df <- data.frame(y=rep(c(1,0), times=1000000))
> 
> system.time(df[ -which(df[,"y"]==1), , drop=FALSE])
   user  system elapsed 
   0.16    0.05    0.23 
> system.time(df[which(df$y == 0), ])
   user  system elapsed 
   0.03    0.01    0.06 
> system.time(subset(df, y == 0))
   user  system elapsed 
   0.14    0.09    0.27 

Upvotes: 3

David F
David F

Reputation: 1265

I'd upvote the answer using "subset" if I had the reputation for it :-) . You can also use a logical vector directly for subsetting -- no need for "which":

d <- data.frame(VAR = c(0,1,0,1,1))
d[d$VAR == 0, , drop=FALSE]

I'm surprised to find the logical version a little faster in at least one case. (I expected the "which" version might win due to R possibly preallocating the proper amount of storage for the result.)

> d <- data.frame(y=rep(c(1,0), times=1000000))
> system.time(d[which(d$y == 0), ])
   user  system elapsed 
  0.119   0.067   0.188 
> system.time(d[d$y == 0, ])
   user  system elapsed 
  0.049   0.024   0.074 

Upvotes: 2

Dirk is no longer here
Dirk is no longer here

Reputation: 368201

Try this:

R> df <- data.frame(VAR = c(0,1,0,1,1))
R> df[ -which(df[,"VAR"]==1), , drop=FALSE]
  VAR
1   0
3   0
R> 

We use which( booleanExpr ) to get the indices for which your condition holds, then use -1 on these to exclude them and lastly use a drop=FALSE to prevent our data.frame of one columns from collapsing into a vector.

Upvotes: 1

Related Questions