Asteroid098
Asteroid098

Reputation: 2825

How to select rows according to column value conditions

I have a data set which looks like the following (partially):

id  name    dummy
1   Jane    1
1   Jane    0
1   Jane    1
2   Mike    0
2   Mike    0
2   Mike    0
2   Mike    0
2   Mike    0
3   Tom     1
3   Tom     1
3   Tom     0
3   Tom     0

I'm trying to eliminate the people where ALL of the variable dummy is 0. So for instance, Tom and Jane would not be eliminated because they have dummy variable 0 or 1, but Mike will be eliminated because he has all 0s. So I would want in the end

   id   name    dummy
    1   Jane    1
    1   Jane    0
    1   Jane    1
    3   Tom     1
    3   Tom     1
    3   Tom     0
    3   Tom     0

I thought about sorting the data frame according to dummy but I can't seem to figure out how to deal with the fact that I'm only trying to eliminate the people who only has 0 values for the variable dummy. Any suggestions would be really helpful!

Upvotes: 3

Views: 2320

Answers (4)

Sven Hohenstein
Sven Hohenstein

Reputation: 81753

A possible solution:

subset(dat, as.logical(ave(dummy, id, FUN = any)))

#    id name dummy
# 1   1 Jane     1
# 2   1 Jane     0
# 3   1 Jane     1
# 9   3  Tom     1
# 10  3  Tom     1
# 11  3  Tom     0
# 12  3  Tom     0

An alternative with data.table:

library(data.table)
setDT(dat)[, if (any(dummy)) .SD, by = id]

Or with dplyr:

library(dplyr)
dat %>% 
  group_by(id) %>% 
  filter(any(dummy))

Upvotes: 2

Jilber Urbina
Jilber Urbina

Reputation: 61214

Consider df is your data.frame, then use tapply and [ to subset what you want:

> ind <- with(df, tapply(dummy, name, sum))
> df[df$name %in% names(ind)[ind!=0], ]
   id name dummy
1   1 Jane     1
2   1 Jane     0
3   1 Jane     1
9   3  Tom     1
10  3  Tom     1
11  3  Tom     0
12  3  Tom     0

Another alternative:

> result <- split(df, df$name)[with(df, tapply(dummy, name, function(x) sum(x)!=0))]
> do.call(rbind, result)

Upvotes: 3

dickoa
dickoa

Reputation: 18437

You can use plyr::ddply too

require(plyr)
ddply(df, .(name), function(x) subset(x, !all(dummy == 0)))
##   id name dummy
## 1  1 Jane     1
## 2  1 Jane     0
## 3  1 Jane     1
## 4  3  Tom     1
## 5  3  Tom     1
## 6  3  Tom     0
## 7  3  Tom     0

Note that it's possible to replace !all(dummy == 0) by any(dummy != 0)

Upvotes: 1

Matthew Lundberg
Matthew Lundberg

Reputation: 42689

ave can be used to produce the groupwise sum, while keeping the original position:

x[with(x, ave(dummy, name, FUN=sum))>0,]
##    id name dummy
## 1   1 Jane     1
## 2   1 Jane     0
## 3   1 Jane     1
## 9   3  Tom     1
## 10  3  Tom     1
## 11  3  Tom     0
## 12  3  Tom     0

ave is something like aggregate but copies the aggregated value for each row in the set:

with(x, ave(dummy, name, FUN=sum))
## [1] 2 2 2 0 0 0 0 0 2 2 2 2

Upvotes: 1

Related Questions