LoveMYMAth
LoveMYMAth

Reputation: 111

Creating a function with multiple arguments that subsets a dataframe [R]

I have a data frame named titanic with 2021 rows of passengers on the titanic and specific characteristics of each passenger:

Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No
...

I want to create a function that has multiple arguments that looks something like this:

f1 <- function(sex, age, class, survived){
...
}

where the arguments are where I input some criteria of the passengers. As an example, I want to be able to input criteria into the function such that

f1("Female", "Child","3rd", "Yes")

returns

     Class    Sex   Age Survived
1534   3rd Female Child      Yes
1535   3rd Female Child      Yes
1536   3rd Female Child      Yes
1537   3rd Female Child      Yes
1538   3rd Female Child      Yes

Now, I have hard-coded it and just used an if else statement to cover all of the possibilities.

function.q6.1 <- function(sex,age,class,survival){
  if(sex == "Male" & age == "Child" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Male" & Age == "Child" & Class == "3rd" & Survived == "No")
  }
  else if(sex == "Female" & age == "Child" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Female" & Age == "Child" & Class == "3rd" & Survived == "No")
  }
  else if(sex == "Male" & age == "Adult" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Male" & Age == "Adult" & Class == "3rd" & Survived == "No")
  }
...
}

I want to know if there is a more efficient way of doing this. Thank you ahead of time.

Upvotes: 0

Views: 514

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 270075

This assumes that the first argument is the data frame and the remaining arguments are values for each of the columns in the order that they appear in the data frame or else are named.

There can be fewer arguments than columns in which case for unnamed arguments the first columns of the data frame will be matched against the same number of arguments. If the arguments are named then the matches will use those names. All arguments after the data frame must either be named or not named. If only the data frame is passed with no other arguments then NULL is returned invisibly.

If there are a non-zero number of arguments after the data frame we get the names or use the first n names where n is the number of arguments after the data frame. Then remove rows with NA's from dat assuming that those rows cannot match. mapply compares successive columns to successive argument values returning a logical matrix. The apply returns one logical value per row and then we subscript by that.

We use the data frame shown reproducibly in the Note at the end in the test calls.

f1 <- function(dat, ...) {
  if (n <- ...length()) {
    if (is.null(nms <- ...names())) nms <- head(names(dat), n)
    dat <- na.omit(dat)
    dat[apply(mapply(`==`, dat[nms], list(...)), 1, all), ]
  }
}

Now we run some tests

f1(dat, "3rd", "Male", "Child", "No")
##   Class  Sex   Age Survived
## 1   3rd Male Child       No
## 2   3rd Male Child       No
## 3   3rd Male Child       No
## 4   3rd Male Child       No
## 5   3rd Male Child       No
## 6   3rd Male Child       No

f1(dat, "3rd", "Female", "Child", "No")
## [1] Class    Sex      Age      Survived
## <0 rows> (or 0-length row.names)

f1(dat, "3rd")
##   Class  Sex   Age Survived
## 1   3rd Male Child       No
## 2   3rd Male Child       No
## 3   3rd Male Child       No
## 4   3rd Male Child       No
## 5   3rd Male Child       No
## 6   3rd Male Child       No

f1(BOD, 1, 8.3)  # BOD is built into R
##   Time demand
## 1    1    8.3

f1(BOD, demand = 8.3)
##   Time demand
## 1    1    8.3

Note

Lines <- "
Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No"
dat <- read.table(text = Lines)

Update

Allow fewer arguments than columns and allow arguments to be named.

Upvotes: 2

TarJae
TarJae

Reputation: 79194

Update:

store your columns and conditions in a vector each and then apply the function to the dataframe:

library(dplyr)
library(stringr)

f1 <- paste(f1, collapse = "|")
cols <- c("Sex", "Age", "Class", "Survived")

my_function <- function(df){
  df %>% 
    select(cols) %>% 
    filter(if_all(everything(), ~str_detect(.,f1))
    )
  }
my_function(df)

First answer:

Maybe another strategy could be:

library(dplyr)
library(stringr)

f1 <- paste(f1, collapse = "|")

my_function <- function(df){
  df %>% 
    select(Sex, Age, Class, Survived) %>% 
    filter(if_all(everything(), ~str_detect(.,f1))
    )
  }

my_function(df)

output:

       Sex   Age Class Survived
1534 Female Child   3rd      Yes
1535 Female Child   3rd      Yes
1536 Female Child   3rd      Yes
1537 Female Child   3rd      Yes
1538 Female Child   3rd      Yes

Upvotes: 1

Eric
Eric

Reputation: 1389

#toy dataset
set.seed(1912)
titanic <- data.frame(class = sample(c("1st","2nd","3rd"),100,replace = T),
                      sex = sample(c("Male","Female"),100,replace = T),
                      age = sample(c("Child","Adult"),100,replace = T),
                      survival = sample(c("Yes","No"),100,replace = T)
                      )

f1 <- function(sex,age,class,survival) {
  titanic[titanic$class==class&titanic$sex==sex&titanic$age==age&titanic$survival==survival,]
}

f1("Female", "Child","3rd", "Yes")

class    sex   age survival
11   3rd Female Child      Yes
15   3rd Female Child      Yes
38   3rd Female Child      Yes
71   3rd Female Child      Yes
85   3rd Female Child      Yes
94   3rd Female Child      Yes

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16998

If you are using a data.frame like shown in your question, you could use

library(dplyr)
my_filter <- function(sex, age, class, survived) {

  df %>% 
    filter(Sex == sex, Age == age, Class == class, Survived == survived)

}

Now my_filter("Female", "Child","3rd", "Yes") returns

   Class    Sex   Age Survived
7    3rd Female Child      Yes
8    3rd Female Child      Yes
9    3rd Female Child      Yes
10   3rd Female Child      Yes
11   3rd Female Child      Yes 

Upvotes: 1

Related Questions