Reputation: 171
I'm trying to create a function that creates a model and could predict the target variable for any given data.frame (eg. mtcars).
#Function to create a model for predicting a target variable
myRegModel = function(myFormula,myData){
sampleIndex = sample(1:nrow(myData),size= 0.7*nrow(myData), replace=FALSE)
myTraining = myData[sampleIndex, ]
myTesting = myData[-sampleIndex, ]
myDataFit = lm(myFormula, data = myTraining)
myTesting$predVar <- predict(myDataFit, myTesting)
myTesting$predErr <- abs(((myTesting$mpg - myTesting$predVar)/ myTesting$mpg)*100)
print(cor(myTesting$mpg, myTesting$predVar))
print(mean(myTesting$predErr))
print(summary(myDataFit))
}
myRegModel(mpg ~ ., myMtCars)
However, I've hard-coded my target varaible (mpg) in the case of finding the predicted error and correlation values above. Since, I'm passing my target variable in the function as first argument, Is there a way I could extract my target variable and dynamically assign to myTesting data.frame. (eg. myTesting$target)
Upvotes: 1
Views: 42
Reputation: 887158
Just to extend @RuiBarradas approach, we can extract the variable directly from the formula using all.vars
then, use [[
as @RuiBarradas suggested
myRegModel <- function(myFormula,myData){
nm1 <- all.vars(myFormula)[1]
sampleIndex <- sample(seq_len(nrow(myData)),size= 0.7*nrow(myData), replace=FALSE)
myTraining <- myData[sampleIndex, ]
myTesting <- myData[-sampleIndex, ]
myDataFit <- lm(myFormula, data = myTraining)
myTesting$predVar <- predict(myDataFit, myTesting)
myTesting$predErr <- abs(((myTesting[[nm1]] -
myTesting$predVar)/ myTesting[[nm1]])*100)
myTesting
}
myMtCars <- mtcars
myRegModel(mpg ~ ., myMtCars)
# mpg cyl disp hp drat wt qsec vs am gear carb predVar predErr
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 26.43998 15.964845
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 20.84027 2.615556
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 20.30464 12.180316
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 18.10403 5.708192
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 11.22245 7.908153
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 27.88747 13.927557
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 25.47992 18.511254
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 16.11037 16.091819
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 25.64254 15.649525
#Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 11.47808 23.479490
Upvotes: 2
Reputation: 76432
Yes, there is a way of doing what you want. You'll just have to use a different notation for the columns of a data.frame
. Generally speaking, when in interactive mode it's OK to use dat$col
. But when you program a function it's much better to use dat[[col]]
. These are exactly the same vector but the latter is far more flexible.
So, in your case this would become myTesting[[target]]
.
Upvotes: 1