Reputation: 31
I have a functioning pylearn2
neural network which loads data from a csv
and predicts a continuous target variable. How can I change it to predict multiple distinct target variables?
I am using Kaggle's African soil dataset.
And have constructed this functioning mlp file:
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.csv_dataset.CSVDataset {
path: 'C:\Users\POWELWE\Git\pylearn2\pylearn2\datasets\soil\training_CA.csv',
task: 'regression',
start: 0,
stop: 1024,
expect_headers: True,
num_outputs: 1
},
model: !obj:pylearn2.models.mlp.MLP {
layers : [
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h0',
dim: 200,
irange: .05,
max_col_norm: 2.
},
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h1',
dim: 200,
irange: .05,
max_col_norm: 2.
},
!obj:pylearn2.models.mlp.LinearGaussian {
init_bias: !obj:pylearn2.models.mlp.mean_of_targets {
dataset: *train },
init_beta: !obj:pylearn2.models.mlp.beta_from_targets {
dataset: *train },
min_beta: 1.,
max_beta: 100.,
beta_lr_scale: 1.,
dim: 1,
layer_name: 'y',
irange: .005
}
],
nvis: 3594,
},
algorithm: !obj:pylearn2.training_algorithms.bgd.BGD {
line_search_mode: 'exhaustive',
batch_size: 1024,
conjugate: 1,
reset_conjugate: 0,
reset_alpha: 0,
updates_per_batch: 10,
monitoring_dataset:
{
'train' : *train,
'valid' : !obj:pylearn2.datasets.csv_dataset.CSVDataset {
path: 'C:\Users\POWELWE\Git\pylearn2\pylearn2\datasets\soil\training_CA.csv',
task: 'regression',
start: 1024,
stop: 1156,
expect_headers: True,
}
},
termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
channel_name: "valid_y_mse",
prop_decrease: 0.,
N: 100
},
},
extensions: [
!obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
channel_name: 'valid_y_mse',
save_path: "${PYLEARN2_TRAIN_FILE_FULL_STEM}_best.pkl"
},
],
save_path: "mlp.pkl",
save_freq: 1
}
For the purpose of predicting a single target variable, I removed all target variables from the dataset except Ca
, and moved that to the first column. When I run the following command in the ipython
console, it functions for that single variable:
%run 'C:\Users\POWELWE\Git\pylearn2\pylearn2\scripts\train.py' mlp.yaml
I would like to include the other 4 target variables (P
, pH
, SOC
, Sand
), but do not know how I can set my model to train on these additional targets. I assume I need to perform some manipulations of num_outputs
, dim
, or nvis
, but haven't had any success in my attempts. This is a precursor project to one with many more target variables, so it is important that I train using a single network, rather than constructing a new network for each target variable.
Upvotes: 3
Views: 922
Reputation: 4792
To train a network which predicts values of several variables at the same time you just need to setup your network to have multiple output neurons and feed it with the training data just the same way you do know but with multiple target values at the same time. I haven't used pylearn ever - I prefer Caffe, nolearn(lasagne) or pybrain, each of these libraries are able to easily handle such cases.
Example of pybrain implementation (code was used in kaggle's BikeShare challenge):
Upvotes: 0