nernac
nernac

Reputation: 185

Why does my h2o auto-encoder have so many input nodes?

I am trying to train an auto-encoder model in R with h2o to detect anomalies in my dataset: my dataset

Here is my code:

df <- read.csv(file=inputFile) # extract dataframe

feature_names <- names(df)

train_df <- df # Use whole dataset for training for this example

# -- Now train auto-encoder model --
library(h2o)
localH2O = h2o.init()
h2o.removeAll() # Close clusters that were already running

train_h2o <- as.h2o(train_df) # Put data in h2o dataframe

# Create deep learning model
result_model = h2o.deeplearning(x = feature_names, training_frame = train_h2o,
                               autoencoder = TRUE,
                               hidden = c(6,5,6),
                               epochs = 50)

Then after the model trains successfully, I enter result_model and get:

  layer units      type dropout       l1       l2 mean_rate rate_rms momentum
1     1   798     Input  0.00 %       NA       NA        NA       NA       NA
2     2     6 Rectifier  0.00 % 0.000000 0.000000  0.018308 0.110107 0.000000
3     3     5 Rectifier  0.00 % 0.000000 0.000000  0.002325 0.001377 0.000000
4     4     6 Rectifier  0.00 % 0.000000 0.000000  0.001975 0.001191 0.000000
5     5   798 Rectifier      NA 0.000000 0.000000  0.010888 0.064831 0.000000

The layer units are: 798, 6, 5, 6, 798, even though it was supposed to have 7 input nodes.

Can anyone help with this? It would be much appreciated.

Upvotes: 2

Views: 126

Answers (1)

Erin LeDell
Erin LeDell

Reputation: 8819

The first layer in an DNN is the input layer -- that is the number of variables (or encoded variables) you have in your training set.

To summarize the comments above, your training frame is being expanded (by default, one-hot encoded) for any categorical columns that you have. Given the screenshot of your dataset, you seem to have mostly all categorical columns (and they must have a total number of categories of ~798). So what you're seeing is reasonable. Since it's an autoencoder, the output layer is the same size as the input layer, which is why the last layer is 798 units as well.

Upvotes: 1

Related Questions