Daniel Cho
Daniel Cho

Reputation: 827

How to drop observations conditionally with same values in R

I'm trying to subset dataframe with age condition. However I want it to be conditional on multiple observations.

The dataframe has 10 observations, with variables 'household id', 'household relation', 'age'. 'Household id' is household number that has been uniquely assigned to each house. 'Household realation' is position of a person in the household. '1' means that the person is head of the household. '2' means that he/she is the spouse of that household. 'Age' is age of the person.

    Household_id     Household_relation    Age 
1            2                1            27
2            2                2            34  
3            4                1            22
4            4                2            23
5            7                2            21
6            7                1            29  
7            9                1            33  
8            9                2            34
9           11                1            31
10          11                2            29

So the data is made of couples of each household. I want to drop couples that are both not in 20s. So if one of them are in 20s, they stay(therefore household id 2 stays). But if they are both not in 20s, I want to drop them from the data(for example, household id 9 should be dropped). So the subsetting process should be conditional on two observations each time.

Since my real data has more then 10000 observations, the syntax should be short enough to subset all the data. I tried to do this using 'for' loop, but couldn't figure out how.

How can I do this procedure in R?

below are my reproducible example code.

Household_id <- c(2,2,4,4,7,7,9,9,11,11)
Household_relation <- c(1,2,1,2,2,1,1,2,1,2)
Age <- c(27,34,22,23,21,29,33,34,31,29)
data <- data.frame(Household_id, Household_relation, Age)

Upvotes: 1

Views: 1532

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388907

In dplyr we can use filter to keep the groups that has any of the members in their 20's.

library(dplyr)
data %>%
   group_by(Household_id) %>%
   filter(any(Age >= 20 & Age < 30))

# Household_id  Household_relation   Age
#         <dbl>              <dbl> <dbl>
#1            2                  1    27
#2            2                  2    34
#3            4                  1    22
#4            4                  2    23
#5            7                  2    21
#6            7                  1    29
#7           11                  1    31
#8           11                  2    29

The base R equivalent with ave would be

data[as.logical(ave(data$Age, data$Household_id, FUN = function(x)
                                                  any(x >= 20 & x < 30))), ]

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can, of course, translate this to "data.table" like:

library(data.table)
as.data.table(data)[, .SD[any(Age >= 20 & Age < 30)], Household_id]
#    Household_id Household_relation Age
# 1:            2                  1  27
# 2:            2                  2  34
# 3:            4                  1  22
# 4:            4                  2  23
# 5:            7                  2  21
# 6:            7                  1  29
# 7:           11                  1  31
# 8:           11                  2  29

Upvotes: 2

Related Questions