Reputation: 1527
I'm trying to make a decision tree using rpart using a data frame that has ~200 columns. Some of these columns have numbers in their names, some have special characters (e.g. "/"). When I try to generate the tree I get error such as the ones below:
R> gg.rpart <- rpart(nospecialchar ~ Special/char, data=temp, method="class")
Error in eval(expr, envir, enclos) : object 'Special' not found
R> gg.rpart <- rpart(nospecialchar ~ "Special/char", data=temp, method="class")
Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars
R> gg.rpart <- rpart(nospecialchar ~ `Special/char`, data=temp, method="class")
Error in `[.data.frame`(frame, predictors) : undefined columns selected
Do I have to change the names to accommodate R or is there some way to pass column names with special characters to R formulae?
Upvotes: 8
Views: 9596
Reputation: 305
I just came across the same problem, and I don't want any change in the name when pass it to R formulae. R allow non-syntactic column names with backticks surrounding them. So I try add backticks to the name and it also works well. My code like below:
lapply(colnames(variable), function(gene){
formula0 <- paste0("gleason_grade", "~" "`", gene, "`")
logit <- clm(as.formula(formula0), data = mydata)
})
and now you can pass the new variable to formula without error.
If you don't expect any change to the variable like me, so just backtick it.
Upvotes: 1
Reputation: 1527
Joran's comment on my question is the answer - I didn't know of the existence of make.names()
Joran, if you reply as an answer I'll mark you as correct. Cheers!
Upvotes: 3
Reputation: 7561
This works:
dat <- data.frame(M=rnorm(10),'A/B'=1:10,check.names=F)
> lm(M~`A/B`,dat)
Call:
lm(formula = M ~ `A/B`, data = dat)
Coefficients:
(Intercept) `A/B`
-1.0494 0.1214
Upvotes: 9