Reputation: 69
I'm new programmer in R and i'm writing my thesis for training a neural network. First i use rminer for datamining and after nnet for training. Now i don't know which function use for divide data-set in training set and validation set, therefore k-fold cross validation, and after use nnet for each of this. sorry for my english. Thanks in advance
Upvotes: 0
Views: 1134
Reputation: 1160
It's maybe too late, but I found this Q while I was looking for an answer to my Q. You can use something like this
# Splitting in training, Cross-Validation and test datasets
#The entire dataset has 100% of the observations. The training dataset will have 60%, the Cross-Validation (CV) will have 20% and the testing dataset will have 20%.
train_ind <- sample(seq_len(nrow(DF.mergedPredModels)), size = floor(0.6 * nrow(DF.mergedPredModels)))
trainDF.mergedPredModels <- DF.mergedPredModels[train_ind, ]
# The CV and testing datasets' observations will be built from the observations from the initial dataset excepting the ones from the training dataset
# Cross-Validation dataset
# The CV's number of observations can be changed simply by changing "0.5" to a fraction of your choice but the CV and testing dataset's fractions must add up to 1.
cvDF.mergedPredModels <- DF.mergedPredModels[-train_ind, ][sample(seq_len(nrow(DF.mergedPredModels[-train_ind, ])), size = floor(0.5 * nrow(DF.mergedPredModels[-train_ind, ]))),]
# Testing dataset
testDF.mergedPredModels <- DF.mergedPredModels[-train_ind, ][-sample(seq_len(nrow(DF.mergedPredModels[-train_ind, ])), size = floor(0.5 * nrow(DF.mergedPredModels[-train_ind, ]))),]
#temporal data and other will be added after the predictions are made because I don't need the models to be built on the dates. Additionally, you can add these columns to the training, CV and testing datasets and plot the real values of your predicted parameter and the respective predicitons over your time variables (half-hour, hour, day, week, month, quarter, season, year, etc.).
# aa = Explicitly specify the columns to be used in the temporal datasets
aa <- c("date", "period", "publish_date", "quarter", "month", "Season")
temporaltrainDF.mergedPredModels <- trainDF.mergedPredModels[, c(aa)]
temporalcvDF.mergedPredModels <- cvDF.mergedPredModels[, c(aa)]
temporaltestDF.mergedPredModels <- testDF.mergedPredModels[, c(aa)]
# bb = Explicitly specify the columns to be used in the training, CV and testing datasets
bb <- c("quarter", "month", "Season", "period", "temp.mean", "wind_speed.mean", "solar_radiation", "realValue")
trainDF.mergedPredModels.Orig <- trainDF.mergedPredModels[, c(bb)]
trainDF.mergedPredModels <- trainDF.mergedPredModels[, c(bb)]
smalltrainDF.mergedPredModels.Orig <- trainDF.mergedPredModels.Orig[1:10,] #see if the models converge without errors
cvDF.mergedPredModels <- cvDF.mergedPredModels[, c(bb)]
testDF.mergedPredModels <- testDF.mergedPredModels[, c(bb)]
# /Splitting in training, Cross-Validation and test datasets
Upvotes: 0
Reputation: 4432
Here is a way to get help on a new topic / package in R when you don't know how to go about it:
library(help=package.name)
This will give you an overview of all the functions and data sets defined in the language with a brief title of each. After you have identified the functions that you need, you can consult the documentation of the functions of interest like so:
?function.name
In the documentation, also pay attention to the See Also
section which typically lists functions that are useful in conjunction with the function being considered. Also, work the examples. You can also use
example(function.name)
for a demonstration of the function's use and common idioms using it.
Lastly, if you are lucky, the package author may have written a vignette
for the package. You can search for all vignettes in a package like this:
vignette(package="package.name")
Hopefully, this will get you started with the rminer
and nnet
packages.
Upvotes: 1