Conor
Conor

Reputation: 1527

Using columns with special characters in formulae in R

I'm trying to make a decision tree using rpart using a data frame that has ~200 columns. Some of these columns have numbers in their names, some have special characters (e.g. "/"). When I try to generate the tree I get error such as the ones below:

R> gg.rpart <- rpart(nospecialchar ~ Special/char, data=temp, method="class")
Error in eval(expr, envir, enclos) : object 'Special' not found
R> gg.rpart <- rpart(nospecialchar ~ "Special/char", data=temp, method="class")
Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars
R> gg.rpart <- rpart(nospecialchar ~ `Special/char`, data=temp, method="class")
Error in `[.data.frame`(frame, predictors) : undefined columns selected

Do I have to change the names to accommodate R or is there some way to pass column names with special characters to R formulae?

Upvotes: 8

Views: 9596

Answers (3)

Yun
Yun

Reputation: 305

I just came across the same problem, and I don't want any change in the name when pass it to R formulae. R allow non-syntactic column names with backticks surrounding them. So I try add backticks to the name and it also works well. My code like below:

lapply(colnames(variable), function(gene){
formula0 <- paste0("gleason_grade", "~" "`", gene, "`")
logit <- clm(as.formula(formula0), data = mydata)
})

and now you can pass the new variable to formula without error.
If you don't expect any change to the variable like me, so just backtick it.

Upvotes: 1

Conor
Conor

Reputation: 1527

Joran's comment on my question is the answer - I didn't know of the existence of make.names()

Joran, if you reply as an answer I'll mark you as correct. Cheers!

Upvotes: 3

Wojciech Sobala
Wojciech Sobala

Reputation: 7561

This works:

dat <- data.frame(M=rnorm(10),'A/B'=1:10,check.names=F)

> lm(M~`A/B`,dat)

Call:
lm(formula = M ~ `A/B`, data = dat)

Coefficients:
(Intercept)        `A/B`  
    -1.0494       0.1214  

Upvotes: 9

Related Questions