user2174781
user2174781

Reputation:

I'm seeking a wildcard character for my function in R, but don't know what to search?

Reproducible example

I seek to build a function to build regressions based on a subset of data. My reproducible example is below:

set.seed(1) # Reproducibility

testdat <- data.frame(x = runif(100),
                      y = rnorm(100),
                      factor = sample(c("A","B"),100,replace=T)) # Create a dummy data set

test.model <- function(input.factor = NULL){
  model.out = lm(y~x,data = testdat[which(testdat$factor == input.factor),])
} # Create a function that regresses x against y, after subsetting

modelA <- test.model(input.factor = "A") # Works fine
modelB <- test.model(input.factor = "B") # Also works fine
modelAll <- test.model(input.factor = "???") # I'm seeking the keyword for all the data here

Problem

My function works fine for cases where input.factor = "A" or "B", but I want to use the function on the entire data set. I've tried using the * wildcard, but that only seems to work for regular expressions.

My question is, what string do I need to type in input.factor = to select all values of the factor variable?

PS As a statistician I know that I should include the factor variable in the regression itself. However, my actual use case is a more complicated model with much more data, so computing a complete model takes too much time.

Upvotes: 1

Views: 167

Answers (2)

joran
joran

Reputation: 173677

I would just do this:

test.model <- function(input.factor = NULL){
  if (is.null(input.factor)){
    model.out = lm(y~x,data = testdata)
  } else{
    model.out = lm(y~x,data = testdat[which(testdat$factor == input.factor),])
  }
 model.out
} #

Upvotes: 2

SeGa
SeGa

Reputation: 9809

as @Ronak already pointed out, you have to make 2 changes.

The function call test.model and your "wildcard". * is SQL-synthax, but you can ask for all the unique names in the dataset with unique().

set.seed(1) # Reproducibility

testdat <- data.frame(x = runif(100),
                      y = rnorm(100),
                      factor = sample(c("A","B"),100,replace=T)) # Create a dummy data set

test.model <- function(input.factor = NULL){
  model.out = lm(y~x,data = testdat[which(testdat$factor %in% input.factor),])
} # Create a function that regresses x against y, after subsetting

modelA <- test.model(input.factor = "A") # Works fine
modelB <- test.model(input.factor = "B") # Also works fine
modelAll <- test.model(input.factor = unique(testdat$factor))

Upvotes: 1

Related Questions