BUML1290
BUML1290

Reputation: 93

Counting variables in a formula

I would like to count the number of variables that enter into the right hand side of a formula. Is there a function that does this?

For example:

y<-rnorm(100)
x1<-rnorm(100)
x2<-rnorm(100)
x3<-rnorm(100)
f<-formula(y~x1+x2+x3)

Then, I would call SomeFunction(f) which would return 3 (since there are 3 x variables on the right hand side of the equation). Does SomeFunction exist?

Upvotes: 5

Views: 2660

Answers (4)

Mark Miller
Mark Miller

Reputation: 13103

If you want to count the number of estimated parameters, as suggested by your comment below G. Grothendieck's answer, you could try the code below. I added one to n.coefficients for the error term, as is done with AIC.

n      <- 20                                       # number of observations
B0     <-  2                                       # intercept
B1     <- -1.5                                     # slope 1
B2     <-  0.5                                     # slope 2
B3     <- -2.5                                     # slope 3
sigma2 <-  5                                       # residual variance

x1     <- sample(1:3, n, replace=TRUE)             # categorical covariate
x12    <- ifelse(x1==2, 1, 0)
x13    <- ifelse(x1==3, 1, 0)
x3     <- round(runif(n, -5 , 5), digits = 3)      # continuous covariate
eps    <- rnorm(n, mean = 0, sd = sqrt(sigma2))    # error
y      <- B0 + B1*x12 + B2*x13 + B3*x3 + eps       # dependent variable
x1     <- as.factor(x1)

model1 <- lm(y ~ x1 + x3)                          # linear regression
model1

summary(model1)

n.coefficients <- as.numeric(sapply(model1, length)[1]) + 1
n.coefficients

# [1] 5

Here is a more straight-forward alternative to the code for n.coefficients:

# For each variable in a linear regression model, one coefficient exists
# An intercept coefficient exists as well
# Subtract -1 to account for the intercept
n.coefficients2 <- length(model1$coefficients) - 1
n.coefficients2

# [1] 5

Upvotes: 1

dardisco
dardisco

Reputation: 5274

In light of your comment, this may depend on how you're fitting the model...

In the case of a linear model, these answers all give 12:

set.seed(1)
df1 <- data.frame (y=rnorm(100),
                   x=rnorm(100),
                   months=sample(letters[1:12], replace=TRUE, size=100))
f1 <-formula(y~x+factor(months))
l1 <- lm(f1, data=df1)
ncol(l1$qr$qr)-1

or

length(colnames(l1$qr$qr))-1

Here qr is the QR decomposition of a matrix as used in fitting the model. It will contain the no. of parameters of interest.

You could also find which variables are factors from the model.frame, such as:

length(unique(model.frame(l1)[["factor(months)"]]))

Or more generally with .getXlevels, which will give you a list of unique values for each factor on the predictor side, as in:

length( stats::.getXlevels(terms(l1), model.frame(l1))[[1]] )

Update

@Mark Miller was barking up a better tree. If your model has an AIC-type method available, you should be able to use this to get the no. of parameters. For a lm, it's a hidden S3 method in stats, so call it like this:

stats:::extractAIC.lm(l1)[[1]] -1

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269526

Here are two possibilities:

length(attr(terms(f), "term.labels"))

length(all.vars(update(f, z ~.))) - 1

Upvotes: 8

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You might need to look at some of the related functions linked in the help page for formula. In particular, terms:

> terms(f)
y ~ x1 + x2 + x3 + x4
attr(,"variables")
list(y, x1, x2, x3, x4)
attr(,"factors")
   x1 x2 x3 x4
y   0  0  0  0
x1  1  0  0  0
x2  0  1  0  0
x3  0  0  1  0
x4  0  0  0  1
attr(,"term.labels")
[1] "x1" "x2" "x3" "x4"
attr(,"order")
[1] 1 1 1 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>

Note the "term.labels" attribute.

Upvotes: 8

Related Questions