Reputation:
I seek to build a function to build regressions based on a subset of data. My reproducible example is below:
set.seed(1) # Reproducibility
testdat <- data.frame(x = runif(100),
y = rnorm(100),
factor = sample(c("A","B"),100,replace=T)) # Create a dummy data set
test.model <- function(input.factor = NULL){
model.out = lm(y~x,data = testdat[which(testdat$factor == input.factor),])
} # Create a function that regresses x against y, after subsetting
modelA <- test.model(input.factor = "A") # Works fine
modelB <- test.model(input.factor = "B") # Also works fine
modelAll <- test.model(input.factor = "???") # I'm seeking the keyword for all the data here
My function works fine for cases where input.factor = "A"
or "B"
, but I want to use the function on the entire data set. I've tried using the *
wildcard, but that only seems to work for regular expressions.
My question is, what string do I need to type in input.factor =
to select all values of the factor
variable?
PS As a statistician I know that I should include the factor
variable in the regression itself. However, my actual use case is a more complicated model with much more data, so computing a complete model takes too much time.
Upvotes: 1
Views: 167
Reputation: 173677
I would just do this:
test.model <- function(input.factor = NULL){
if (is.null(input.factor)){
model.out = lm(y~x,data = testdata)
} else{
model.out = lm(y~x,data = testdat[which(testdat$factor == input.factor),])
}
model.out
} #
Upvotes: 2
Reputation: 9809
as @Ronak already pointed out, you have to make 2 changes.
The function call test.model
and your "wildcard". * is SQL-synthax, but you can ask for all the unique names in the dataset with unique()
.
set.seed(1) # Reproducibility
testdat <- data.frame(x = runif(100),
y = rnorm(100),
factor = sample(c("A","B"),100,replace=T)) # Create a dummy data set
test.model <- function(input.factor = NULL){
model.out = lm(y~x,data = testdat[which(testdat$factor %in% input.factor),])
} # Create a function that regresses x against y, after subsetting
modelA <- test.model(input.factor = "A") # Works fine
modelB <- test.model(input.factor = "B") # Also works fine
modelAll <- test.model(input.factor = unique(testdat$factor))
Upvotes: 1