jamse
jamse

Reputation: 354

How to specify multiple conditions that can be subsetted on jointly or separately in R

I want to be able to define an object that specifies multiple rules. My desired behaviour is that when I use that object in a call to subset() it should apply all of the rules, but that I should also be able to pick them out for separate application.

Some sample data:

temp_data <- data.frame(a = c(1,2,3,4,5,6,7,8,9,0),
                        b = c(0,0,0,1,1,1,1,1,1,1),
                        c = c(0,0,1,0,0,1,1,1,1,1),
                        d = c(1,1,1,1,1,2,2,2,2,0),
                        e = c(1,1,1,1,1,1,1,1,0,2))

Subsetting, according to some rules:

# Rule 1
subset(temp_data, b == 1)
# Rule 2
subset(temp_data, c == 1)
# Rule 3
subset(temp_data, d != 0 & e != 0)
# All three rules
subset(temp_data, b == 1 & c == 1 & d != 0 & e != 0)

I have tried collecting the rules together using expression():

temp_subset_rules_1 <- expression(b == 1,
                                  c == 1,
                                  d != 0 & e != 0)

This runs, but doesn't apply all three rules. It just applies the last one.

subset(temp_data, eval(temp_subset_rules_1))

I can run each rule separately:

subset(temp_data, eval(temp_subset_rules_1[1]))
subset(temp_data, eval(temp_subset_rules_1[2]))
subset(temp_data, eval(temp_subset_rules_1[3]))

I can manually run all the rules together:

subset(temp_data, 
       eval(temp_subset_rules_1[1]) &
         eval(temp_subset_rules_1[2]) &
         eval(temp_subset_rules_1[3]))

I have also tried collecting the rules using c() and quote():

temp_subset_rules_2 <- c(quote(b == 1),
                         quote(c == 1),
                         quote(d != 0 & e != 0))

This doesn't runs at all:

subset(temp_data, eval(temp_subset_rules_2))

Referencing each rule stored in this way requires [[ rather than [. So this works separately:

subset(temp_data, eval(temp_subset_rules_2[[1]]))
subset(temp_data, eval(temp_subset_rules_2[[2]]))
subset(temp_data, eval(temp_subset_rules_2[[3]]))

And this works together:

subset(temp_data, 
       eval(temp_subset_rules_2[[1]]) &
         eval(temp_subset_rules_2[[2]]) &
         eval(temp_subset_rules_2[[3]]))

I am looking for an object that, when passed as the second (subset =) argument to subset(), will apply all of its components, but where it remains possible to run each rule separately. I.e., the output of these should be the same:

subset(temp_data, rules_collected_somehow)
subset(temp_data, b == 1 & c == 1 & d != 0 & e != 0)

But also I should be able to extract elements from rules_collected_somehow programmatically and be able to produce separately results equal to each of these:

subset(temp_data, b == 1)
subset(temp_data, c == 1)
subset(temp_data, d != 0 & e != 0)

I had considered the possibility of storing the separate rules as a single rule and then splitting it at the &s. But unfortunately that doesn't work because I need some composite rules that have to contain & within them.


I am aware of (though admittedly don't fully grok) the warning to avoid subset() in favour of, for example, [. However, I will sometimes be applying this to survey.design objects (survey::svydesign()), where my understanding is that subset() is the approved / preferred function.

Upvotes: 1

Views: 58

Answers (2)

andrew_reece
andrew_reece

Reputation: 21274

Using tidyverse methods, you can create a list of expressions, and then use filter() to apply them all or one at a time.

library(tidyverse)

rule_exprs <- rlang::exprs(b == 1, c == 1, d != 0 & e != 0)

temp_data %>% filter(!!!rule_exprs)
  a b c d e
1 6 1 1 2 1
2 7 1 1 2 1
3 8 1 1 2 1

temp_data %>% filter(!!rule_exprs[[1]])
  a b c d e
1 4 1 0 1 1
2 5 1 0 1 1
3 6 1 1 2 1
4 7 1 1 2 1
5 8 1 1 2 1
6 9 1 1 2 0
7 0 1 1 0 2

Upvotes: 1

MrFlick
MrFlick

Reputation: 206411

If you keep your conditions in an expression collection

temp_subset_rules_1 <- expression(b == 1,
                                  c == 1,
                                  d != 0 & e != 0)

You can write a helper function to join multiple terms into one combining them with &

join_terms <- function(x) {
  Reduce(function(a, b) bquote((.(a)) & (.(b))), x)
}

Then you can use this with subset/eval

subset(temp_data, eval(join_terms(temp_subset_rules_1)))
subset(temp_data, eval(join_terms(temp_subset_rules_1[2])))
subset(temp_data, eval(join_terms(temp_subset_rules_1[c(1,3)])))

Upvotes: 0

Related Questions