Reputation: 861
I have a dataset of variables
Year Age Bag Computer
2008 0 4 4
2008 1 5 3
2008 2 5 12.5
2008 3 5 15
2008 4 5 33
2008 5 5 11
2008 85 5 3.5
2008 . . .
2008 108 0 0
2008 109 0 0
2008 110+ 0 0
I need to subset this in R in order to remove all the zero on my database and get this final table
Year Age Bag Computer
2008 0 4 4
2008 1 5 3
2008 2 5 12.5
2008 3 5 15
2008 4 5 33
2008 5 5 11
2008 7 5 14.5
2008 8 5 17
without zero for the older ages.
Upvotes: 2
Views: 197
Reputation: 59970
Looks like you want to subset such that if you have a zero in any
column except Age
the row is removed. I'd use apply
to run over the rows and use a logical comparison to see if any
value (except age) is equal to 0
So I'd do this (assuming database is called df
):
# Return a logical vector. TRUE if any value in row == 0
# x[-2] removes the df$Age column from comparison
idx <- apply( df , 1 , function(x) any( x[-2] == 0 ) )
# Use this to subset
df[ !idx , ]
# Year Age Bag Computer
#1 2008 0 4 4.0
#2 2008 1 5 3.0
#3 2008 2 5 12.5
#4 2008 3 5 15.0
#5 2008 4 5 33.0
#6 2008 5 5 11.0
#7 2008 85 5 3.5
Because @Arun complained I was being slow (I was - but I was going for convenience code that would easily allow you to stick a dataframe or matrix with lots of columns to test on) I throw in the canonical (and fastest!) way of subsetting this would be:
df[ df$Bag == 0 | df$Computer == 0 , ]
Which was implicitly covered in the answer @Konrad gave anyway.
Upvotes: 1
Reputation: 263301
If you want to identify the rows at which the 0 appears in columns Bag or Computer (assuming your dataframe is named dat
, you would use:
bad.rows <- which(dat$Bag==0 | dat$Computer==0)
You could subset them out as:
subset(dat, !rownames(dat) %in% bad.rows)
Or you could skip the step of identifying the rows and just use subset:
subset(dat, Bag!=0 & Computer!=0)
Note the negation of the equalities meant one might need to switch to using "&" (or perhaps that was not what you wanted.) Your description was a bit vague on that aspect. Could be you only wanted to remove them if both were zero or remove ages which had all zeros above a particular age.
subset(dat, !(Bag==0 & Computer==0) ) # ages with any non-zero
Upvotes: 2
Reputation: 118779
Another way:
df[with(df, complete.cases(cbind(Bag, Computer)/0)), ]
Benchmarking on bigger data:
set.seed(45)
sz <- 1e6
df <- data.frame(Year=sample(1930:2013, sz, replace=TRUE),
Age=sample(100, sz, replace=TRUE),
Bag = sample(0:5, sz, TRUE),
Computer=sample(0:10, sz, TRUE))
simon <- function(dt) {
idx <- apply( dt , 1 , function(x) any( x[-2] == 0 ) )
dt[ !idx , ]
}
dwin_konrad <- function(dt) {
subset(dt, Bag != 0 & Computer != 0)
}
arun <- function(dt) {
dt[with(dt, complete.cases(cbind(Bag, Computer)/0)), ]
}
require(microbenchmark)
microbenchmark(o1 <- simon(df), o2 <- dwin_konrad(df), o3 <- arun(df), times=5)
Unit: milliseconds
expr min lq median uq max neval
o1 <- simon(df) 15971.7720 16348.4055 16540.6794 18153.9090 18443.5480 5
o2 <- dwin_konrad(df) 402.7010 419.3139 494.9592 723.1468 745.5400 5
o3 <- arun(df) 320.8689 324.0388 334.0515 335.8886 366.6647 5
identical(o1, o2) # TRUE
identical(o1, o3) # TRUE
Upvotes: 1
Reputation: 545508
To subset a table, use … well, subset
:
newTable <- subset(oldTable, Bag != 0)
Or, equivalently,
newTable <- oldTable[oldTable$Bag != 0, ]
It’s unclear what exactly your criterion is. If you want to remove any row where either Bar
or Computer
is 0, then you can combine the condition:
newTable <- subset(oldTable, Bag != 0 & Computer != 0)
(The other syntax also works.)
Upvotes: 1