Ankhnesmerira
Ankhnesmerira

Reputation: 1430

GBM Bernoulli returns no results with NaN

I know this question has been asked multiple times but I've run out of ideas to get the model working. The first 50 rows of the train data:

> train[1:25]
    a       b   c   d e f g    h    i j    k    l  m
 1: 0  148.00  27  16 0 A 0  117   92 0   13  271  2
 2: 0  207.00  37   8 0 C 0   46   29 0   29  555  5
 3: 0 1497.00  44   1 0 A 1 3754 2119 1 1961 5876  6
 4: 0  463.00  44   1 0 A 0  287  202 0  105 1037  4
 5: 0   19.00  82   1 0 A 0  301  186 0  344 2116  3
 6: 0  204.00  41   1 0 A 0   92   76 0  290 1608 10
 7: 0   79.00  69  16 0 B 0   48   29 0    1   27  3
 8: 0  256.75  71  16 1 A 0  131  112 0   36 1183  0
 9: 0  256.75  71  16 1 A 0  131  112 0   36 1183  2
10: 1   49.00  13  13 0 C 0    5    4 0    0   11  1
11: 0   19.00  76   1 0 A 0  897  440 0  575 2674  3
12: 0   49.00 100 100 0 C 0    6    6 0    0    0  1
13: 0  107.00  65   1 0 A 3  334  212 0  421 2773  6
14: 0   79.00  28  16 0 B 0   42   49 0   13  345  2
15: 0 1742.00  61   1 0 A 0  589  340 0  444 3853  8
16: 0  187.00  20  16 0 A 0  123   99 0   70  841  4
17: 0   68.00  73   1 0 A 0  757  507 0  359  773  3
18: 0  157.00  32  16 0 B 0   33   27 0    4  144  2
19: 0   49.00  52  16 0 C 0   10    7 0    2   51  3
20: 0   79.00  53  16 0 B 0   20    9 0    0   40  4
21: 0   68.00  45   1 0 A 0  370  245 0  298 1826  3
22: 0 1074.00  46   1 0 A 0  605  220 0  280 1421  7
23: 0   19.00  84   1 0 A 0  357  214 0  104 1273  3
24: 0   68.00  42   1 0 A 0  107   97 0  224 1526  3
25: 0  226.00  39   1 0 A 0  228  162 0  139  559  3
26: 0   49.00  92  16 0 C 0    4    3 0    0    0  3
27: 0   68.00  46   1 0 A 0  155  104 0   60 1170  3
28: 1   98.00  29   2 0 C 0   15   13 0    1  659  3
29: 0  248.00  44   1 0 A 0  347  204 0  281 1484  4
30: 0   19.00  84   1 0 A 0  302  166 0  170 2800  3
31: 0  444.00  20  16 0 A 0  569  411 1  369 1095  4
32: 0  157.00  20  16 0 B 0   38   30 0   18  265  3
33: 0  208.00  71  16 0 B 0   22   22 0    1  210  3
34: 1   84.00  27  13 0 A 0   37   24 0    1  649  1
35: 1  297.00  17   7 0 A 0   26   21 0    0    0  1
36: 1   49.00  43  16 1 C 0    4    4 0    0    0  2
37: 0   99.00  36   1 0 A 0  614  432 0  851 2839  4
38: 0  354.00  91   2 1 C 0   74   48 0  102 1005  9
39: 0   68.00  62  16 0 A 0   42   32 0    0    0  3
40: 0   49.00  78  16 0 C 0   12   10 0    0   95  3
41: 0   49.00  57  16 0 C 1    9    8 0    1  582  3
42: 0   68.00  49   1 0 A 0   64   47 0   49  112  3
43: 0  583.00  70   2 1 A 0  502  293 0  406 2734  9
44: 0  187.00  29   1 0 A 0  186  129 0  118 2746  5
45: 0  178.00  52   1 0 A 0  900  484 0  180 1701  4
46: 1   98.00  50  44 0 C 0   13   12 0    1  647  4
47: 1  548.00  21  14 0 A 0   19   14 0    0    0  1
48: 0  178.00  28  16 0 C 0   43   33 0    6  921  3
49: 1   49.00  20  20 0 C 0    8    6 0    0    0  1
50: 0   49.00 124 124 1 A 0   14   11 0    0    0  1
    a       b   c   d e f g    h    i j    k    l  m

This data is not normalised, but it doesn't matter at this stage. I can't get a simple gbm model work using the gbm package:

> require(gbm)
> gbm_model <- gbm(a ~ .  

                 , data = train
                 , distribution="bernoulli"   
                 , n.trees= 10
                 , shrinkage=0.001
                 , bag.fraction = 1
                 , train.fraction = 0.5

                 , n.minobsinnode = 3
                 , cv.folds = 0 # no cross-validation
                 , keep.data=TRUE
                 , verbose=TRUE 
    )

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1           nan             nan     0.0010       nan
     2           nan             nan     0.0010       nan
     3           nan             nan     0.0010       nan
     4           nan             nan     0.0010       nan
     5           nan             nan     0.0010       nan
     6           nan             nan     0.0010       nan
     7           nan             nan     0.0010       nan
     8           nan             nan     0.0010       nan
     9           nan             nan     0.0010       nan
    10           nan             nan     0.0010       nan

Columns 'e' and 'f' are factors. Train data sample size is approximately 6,000. I've tried running gbm with various bag.fraction, train.fraction, n.tree, and shrinkage values but still get the same result of all NaNs. Trees and SVM work without any problem on the same data. I even tried converting column 'f' to character, as it was suggested in previous posts, and it didn't work.


Edit: data has no NAs or invalid values. I tried one-hot encoding the 'f' column and still same results.

Upvotes: 4

Views: 1288

Answers (1)

info_seekeR
info_seekeR

Reputation: 1326

In my case, this issue was resolved by converting the dependent variable to character.

 gbm_model <- gbm(as.character(a) ~ .  
                 , data = train
                 , distribution="bernoulli"   
                 , n.trees= 10
                 , shrinkage=0.001
                 , bag.fraction = 1
                 , train.fraction = 0.5
                 , n.minobsinnode = 3
                 , cv.folds = 0 # no cross-validation
                 , keep.data=TRUE
                 , verbose=TRUE 
    )

Upvotes: 1

Related Questions