Igor Melnichenko
Igor Melnichenko

Reputation: 145

How to create a model for anomaly detection in H2O-R

I'm trying to run H2O's anomaly detection in R (h2o_3.14.0.2).

First, I've tried to use my main deep learning model and got the error:

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Only for AutoEncoder Deep Learning model."
 ...

OK, my bad. I've set autoencoder to TRUE:

h2o.deeplearning(y = response, training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)

And got new error:

Error in .verify_dataxy(training_frame, x, y, autoencoder): `y` should not be specified for autoencoder=TRUE, remove `y` input
Traceback:

1. h2o.deeplearning(y = response, training_frame = training.frame, 
 .     validation_frame = test.frame, autoencoder = TRUE)
2. .verify_dataxy(training_frame, x, y, autoencoder)
3. stop("`y` should not be specified for autoencoder=TRUE, remove `y` input")

OK, so I should've removed y:

h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)

But:

Error in is.numeric(y): argument "y" is missing, with no default
Traceback:

1. h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, 
 .     autoencoder = TRUE)
2. is.numeric(y)

Hm, the last two requirements look mutually exclusive. But OK, I'll try another model:

anomaly.detection.model <- h2o.glrm(training_frame = training.frame, k = 10, seed = common.seed)

h2o.anomaly(anomaly.detection.model, training.frame, per_feature = FALSE)

And get another type of error:

java.lang.AssertionError
 [1] "java.lang.AssertionError"                                                                                    
 [2] "    water.api.ModelMetricsHandler.predict(ModelMetricsHandler.java:439)"
 ...

The failed assertion is assert s.reconstruct_train;. Didn't dig into it yet. Maybe I will have luck with GBM or RF?

model = h2o.gbm(y = response,
                training_frame = training.frame,
                validation_frame = validation.frame,
                max_hit_ratio_k = 10,
                seed = common.seed,
                stopping_rounds = 3,
                stopping_tolerance = 1e-2)

h2o.anomaly(model, training.frame, per_feature = FALSE)

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Requires a Deep Learning, GLRM, DRF or GBM model."

And the same for RF.

So I have two questions:

  1. How to detect anomalies?
  2. Are these are bugs or I did something wrong?

Upvotes: 1

Views: 1285

Answers (2)

vlad1490
vlad1490

Reputation: 365

I tried myself to detect anomaly on Time-Series data. To learn the concept I was using this blog. The explanations in this blog worked fine for me.

I hope to contribute with some visual representation of what is happening when we detect anomaly. In the example, Deep Learning model was fit on this ECG dataset. The data looks physically like this:

Data we fit our Deep Learning Model

After that we provide test dataset (containing anomaly) that will look like this: Data we test our Deep Learning Model on

Anomaly detection itself is possible when 'Artificial Intelligence' sees the difference using Metric MSE or Mean Square Error

This is what AI 'see' on Test dataset

The generated MSE can be obtained as in example

MSE output

Upvotes: 1

AvkashChauhan
AvkashChauhan

Reputation: 20556

Enabling autoencoder (as TRUE) becomes a clustering problem so there is no need to set response (y).

Also when autoencoder is set to TRUE you still need to set x. The problem you see above with autoencoder is TRUE that you dont have predictors (x) set. Once you set the x your problem will go away.

Here is I did run a quick anomaly detection test (learn more in this blog) with H2O 3.14.0.2 on R:

  > library(h2o)
  > h2o.init()
  Reading in config file: ./.h2oconfig

  H2O is not running yet, starting it now...

  Note:  In case of errors look at the following log files:
      /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.out
      /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.err

  java version "1.8.0_101"
  Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
  Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

  Starting H2O JVM and connecting: .. Connection successful!

  R is connected to the H2O cluster: 
      H2O cluster uptime:         1 seconds 948 milliseconds 
      H2O cluster version:        3.14.0.2 
      H2O cluster version age:    24 days  
      H2O cluster name:           H2O_started_from_R_avkashchauhan_alj381 
      H2O cluster total nodes:    1 
      H2O cluster total memory:   3.56 GB 
      H2O cluster total cores:    8 
      H2O cluster allowed cores:  8 
      H2O cluster healthy:        TRUE 
      H2O Connection ip:          localhost 
      H2O Connection port:        54321 
      H2O Connection proxy:       NA 
      H2O Internal Security:      FALSE 
      H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
      R Version:                  R version 3.4.0 (2017-04-21) 

  > mtcar = h2o.importFile('https://raw.githubusercontent.com/woobe/H2O_London_Workshop/master/data/auto_design.csv')
    |==================================================================================================================================| 100%
  > mtcar$gear = as.factor(mtcar$gear)
  > mtcar$carb = as.factor(mtcar$carb)
  > mtcar$cyl = as.factor(mtcar$cyl)
  > mtcar$vs = as.factor(mtcar$vs)
  > mtcar$am = as.factor(mtcar$am)
  > mtcar.dl = h2o.deeplearning(x = 2:12, training_frame = mtcar, autoencoder = TRUE, hidden = c(1,1,1), epochs = 100,seed=1)
    |==================================================================================================================================| 100%
  > errors <- h2o.anomaly(mtcar.dl, mtcar, per_feature = TRUE)
  > print(errors)
    reconstr_carb.1.SE reconstr_carb.2.SE reconstr_carb.3.SE reconstr_carb.4.SE reconstr_carb.6.SE reconstr_carb.8.SE
  1                  0                  0                  0                  1                  0                  0
  2                  0                  0                  0                  1                  0                  0
  3                  1                  0                  0                  0                  0                  0
  4                  1                  0                  0                  0                  0                  0
  5                  0                  1                  0                  0                  0                  0
  6                  1                  0                  0                  0                  0                  0
    reconstr_carb.missing(NA).SE reconstr_cyl.4.SE reconstr_cyl.6.SE reconstr_cyl.8.SE reconstr_cyl.10.SE reconstr_cyl.missing(NA).SE
  1                            0                 0                 1                 0                  0                           0
  2                            0                 0                 1                 0                  0                           0
  3                            0                 1                 0                 0                  0                           0
  4                            0                 0                 1                 0                  0                           0
  5                            0                 0                 0                 1                  0                           0
  6                            0                 0                 1                 0                  0                           0
    reconstr_gear.3.SE reconstr_gear.4.SE reconstr_gear.5.SE reconstr_gear.missing(NA).SE reconstr_vs.0.SE reconstr_vs.1.SE
  1                  0                  1                  0                            0                1                0
  2                  0                  1                  0                            0                1                0
  3                  0                  1                  0                            0                0                1
  4                  1                  0                  0                            0                0                1
  5                  1                  0                  0                            0                1                0
  6                  1                  0                  0                            0                0                1
    reconstr_vs.missing(NA).SE reconstr_am.0.SE reconstr_am.1.SE reconstr_am.missing(NA).SE reconstr_mpg.SE reconstr_disp.SE reconstr_hp.SE
  1                          0                0                1                          0    8.705556e-05     0.0196626269   0.0035177471
  2                          0                0                1                          0    8.705556e-05     0.0196626269   0.0035177471
  3                          0                0                1                          0    2.684331e-04     0.0411916382   0.0045768080
  4                          0                1                0                          0    1.307597e-05     0.0004837585   0.0035177471
  5                          0                1                0                          0    1.779785e-03     0.0102131519   0.0007516691
  6                          0                1                0                          0    2.576469e-03     0.0038200199   0.0038147898
    reconstr_drat.SE reconstr_wt.SE reconstr_qsec.SE
  1      0.002147682    0.002080628      0.003914459
  2      0.002147682    0.002054817      0.003843678
  3      0.002153499    0.002111200      0.003646228
  4      0.002244072    0.002020654      0.003545225
  5      0.002235761    0.001998203      0.003843678
  6      0.002282261    0.001996213      0.003451600

  [32 rows x 28 columns]

You can also do the GLRM on the same dataset as below, you must need to set k and there is no need to pass x with GLRM however the dataset must not have constant columns. Thats why I am using filtered dataset with GLRM as in Deep Learning.

> mtcar_glrm = mtcar[2:12]
> mtcar.glrm = h2o.glrm(training_frame = mtcar_glrm,seed=1, k = 5)

Upvotes: 0

Related Questions