Reputation: 185
I am trying to train an auto-encoder model in R with h2o to detect anomalies in my dataset:
Here is my code:
df <- read.csv(file=inputFile) # extract dataframe
feature_names <- names(df)
train_df <- df # Use whole dataset for training for this example
# -- Now train auto-encoder model --
library(h2o)
localH2O = h2o.init()
h2o.removeAll() # Close clusters that were already running
train_h2o <- as.h2o(train_df) # Put data in h2o dataframe
# Create deep learning model
result_model = h2o.deeplearning(x = feature_names, training_frame = train_h2o,
autoencoder = TRUE,
hidden = c(6,5,6),
epochs = 50)
Then after the model trains successfully, I enter result_model
and get:
layer units type dropout l1 l2 mean_rate rate_rms momentum
1 1 798 Input 0.00 % NA NA NA NA NA
2 2 6 Rectifier 0.00 % 0.000000 0.000000 0.018308 0.110107 0.000000
3 3 5 Rectifier 0.00 % 0.000000 0.000000 0.002325 0.001377 0.000000
4 4 6 Rectifier 0.00 % 0.000000 0.000000 0.001975 0.001191 0.000000
5 5 798 Rectifier NA 0.000000 0.000000 0.010888 0.064831 0.000000
The layer units are: 798, 6, 5, 6, 798, even though it was supposed to have 7 input nodes.
Can anyone help with this? It would be much appreciated.
Upvotes: 2
Views: 126
Reputation: 8819
The first layer in an DNN is the input layer -- that is the number of variables (or encoded variables) you have in your training set.
To summarize the comments above, your training frame is being expanded (by default, one-hot encoded) for any categorical columns that you have. Given the screenshot of your dataset, you seem to have mostly all categorical columns (and they must have a total number of categories of ~798). So what you're seeing is reasonable. Since it's an autoencoder, the output layer is the same size as the input layer, which is why the last layer is 798 units as well.
Upvotes: 1