Reputation: 2825
I have a data set which looks like the following (partially):
id name dummy
1 Jane 1
1 Jane 0
1 Jane 1
2 Mike 0
2 Mike 0
2 Mike 0
2 Mike 0
2 Mike 0
3 Tom 1
3 Tom 1
3 Tom 0
3 Tom 0
I'm trying to eliminate the people where ALL of the variable dummy
is 0. So for instance, Tom
and Jane
would not be eliminated because they have dummy
variable 0 or 1, but Mike
will be eliminated because he has all 0s. So I would want in the end
id name dummy
1 Jane 1
1 Jane 0
1 Jane 1
3 Tom 1
3 Tom 1
3 Tom 0
3 Tom 0
I thought about sorting the data frame according to dummy
but I can't seem to figure out how to deal with the fact that I'm only trying to eliminate the people who only has 0 values for the variable dummy
. Any suggestions would be really helpful!
Upvotes: 3
Views: 2320
Reputation: 81753
A possible solution:
subset(dat, as.logical(ave(dummy, id, FUN = any)))
# id name dummy
# 1 1 Jane 1
# 2 1 Jane 0
# 3 1 Jane 1
# 9 3 Tom 1
# 10 3 Tom 1
# 11 3 Tom 0
# 12 3 Tom 0
An alternative with data.table
:
library(data.table)
setDT(dat)[, if (any(dummy)) .SD, by = id]
Or with dplyr
:
library(dplyr)
dat %>%
group_by(id) %>%
filter(any(dummy))
Upvotes: 2
Reputation: 61214
Consider df
is your data.frame, then use tapply
and [
to subset what you want:
> ind <- with(df, tapply(dummy, name, sum))
> df[df$name %in% names(ind)[ind!=0], ]
id name dummy
1 1 Jane 1
2 1 Jane 0
3 1 Jane 1
9 3 Tom 1
10 3 Tom 1
11 3 Tom 0
12 3 Tom 0
Another alternative:
> result <- split(df, df$name)[with(df, tapply(dummy, name, function(x) sum(x)!=0))]
> do.call(rbind, result)
Upvotes: 3
Reputation: 18437
You can use plyr::ddply
too
require(plyr)
ddply(df, .(name), function(x) subset(x, !all(dummy == 0)))
## id name dummy
## 1 1 Jane 1
## 2 1 Jane 0
## 3 1 Jane 1
## 4 3 Tom 1
## 5 3 Tom 1
## 6 3 Tom 0
## 7 3 Tom 0
Note that it's possible to replace !all(dummy == 0)
by any(dummy != 0)
Upvotes: 1
Reputation: 42689
ave
can be used to produce the groupwise sum, while keeping the original position:
x[with(x, ave(dummy, name, FUN=sum))>0,]
## id name dummy
## 1 1 Jane 1
## 2 1 Jane 0
## 3 1 Jane 1
## 9 3 Tom 1
## 10 3 Tom 1
## 11 3 Tom 0
## 12 3 Tom 0
ave
is something like aggregate
but copies the aggregated value for each row in the set:
with(x, ave(dummy, name, FUN=sum))
## [1] 2 2 2 0 0 0 0 0 2 2 2 2
Upvotes: 1