Reputation: 1430
I know this question has been asked multiple times but I've run out of ideas to get the model working. The first 50 rows of the train data:
> train[1:25]
a b c d e f g h i j k l m
1: 0 148.00 27 16 0 A 0 117 92 0 13 271 2
2: 0 207.00 37 8 0 C 0 46 29 0 29 555 5
3: 0 1497.00 44 1 0 A 1 3754 2119 1 1961 5876 6
4: 0 463.00 44 1 0 A 0 287 202 0 105 1037 4
5: 0 19.00 82 1 0 A 0 301 186 0 344 2116 3
6: 0 204.00 41 1 0 A 0 92 76 0 290 1608 10
7: 0 79.00 69 16 0 B 0 48 29 0 1 27 3
8: 0 256.75 71 16 1 A 0 131 112 0 36 1183 0
9: 0 256.75 71 16 1 A 0 131 112 0 36 1183 2
10: 1 49.00 13 13 0 C 0 5 4 0 0 11 1
11: 0 19.00 76 1 0 A 0 897 440 0 575 2674 3
12: 0 49.00 100 100 0 C 0 6 6 0 0 0 1
13: 0 107.00 65 1 0 A 3 334 212 0 421 2773 6
14: 0 79.00 28 16 0 B 0 42 49 0 13 345 2
15: 0 1742.00 61 1 0 A 0 589 340 0 444 3853 8
16: 0 187.00 20 16 0 A 0 123 99 0 70 841 4
17: 0 68.00 73 1 0 A 0 757 507 0 359 773 3
18: 0 157.00 32 16 0 B 0 33 27 0 4 144 2
19: 0 49.00 52 16 0 C 0 10 7 0 2 51 3
20: 0 79.00 53 16 0 B 0 20 9 0 0 40 4
21: 0 68.00 45 1 0 A 0 370 245 0 298 1826 3
22: 0 1074.00 46 1 0 A 0 605 220 0 280 1421 7
23: 0 19.00 84 1 0 A 0 357 214 0 104 1273 3
24: 0 68.00 42 1 0 A 0 107 97 0 224 1526 3
25: 0 226.00 39 1 0 A 0 228 162 0 139 559 3
26: 0 49.00 92 16 0 C 0 4 3 0 0 0 3
27: 0 68.00 46 1 0 A 0 155 104 0 60 1170 3
28: 1 98.00 29 2 0 C 0 15 13 0 1 659 3
29: 0 248.00 44 1 0 A 0 347 204 0 281 1484 4
30: 0 19.00 84 1 0 A 0 302 166 0 170 2800 3
31: 0 444.00 20 16 0 A 0 569 411 1 369 1095 4
32: 0 157.00 20 16 0 B 0 38 30 0 18 265 3
33: 0 208.00 71 16 0 B 0 22 22 0 1 210 3
34: 1 84.00 27 13 0 A 0 37 24 0 1 649 1
35: 1 297.00 17 7 0 A 0 26 21 0 0 0 1
36: 1 49.00 43 16 1 C 0 4 4 0 0 0 2
37: 0 99.00 36 1 0 A 0 614 432 0 851 2839 4
38: 0 354.00 91 2 1 C 0 74 48 0 102 1005 9
39: 0 68.00 62 16 0 A 0 42 32 0 0 0 3
40: 0 49.00 78 16 0 C 0 12 10 0 0 95 3
41: 0 49.00 57 16 0 C 1 9 8 0 1 582 3
42: 0 68.00 49 1 0 A 0 64 47 0 49 112 3
43: 0 583.00 70 2 1 A 0 502 293 0 406 2734 9
44: 0 187.00 29 1 0 A 0 186 129 0 118 2746 5
45: 0 178.00 52 1 0 A 0 900 484 0 180 1701 4
46: 1 98.00 50 44 0 C 0 13 12 0 1 647 4
47: 1 548.00 21 14 0 A 0 19 14 0 0 0 1
48: 0 178.00 28 16 0 C 0 43 33 0 6 921 3
49: 1 49.00 20 20 0 C 0 8 6 0 0 0 1
50: 0 49.00 124 124 1 A 0 14 11 0 0 0 1
a b c d e f g h i j k l m
This data is not normalised, but it doesn't matter at this stage. I can't get a simple gbm model work using the gbm package:
> require(gbm)
> gbm_model <- gbm(a ~ .
, data = train
, distribution="bernoulli"
, n.trees= 10
, shrinkage=0.001
, bag.fraction = 1
, train.fraction = 0.5
, n.minobsinnode = 3
, cv.folds = 0 # no cross-validation
, keep.data=TRUE
, verbose=TRUE
)
Iter TrainDeviance ValidDeviance StepSize Improve
1 nan nan 0.0010 nan
2 nan nan 0.0010 nan
3 nan nan 0.0010 nan
4 nan nan 0.0010 nan
5 nan nan 0.0010 nan
6 nan nan 0.0010 nan
7 nan nan 0.0010 nan
8 nan nan 0.0010 nan
9 nan nan 0.0010 nan
10 nan nan 0.0010 nan
Columns 'e' and 'f' are factors. Train data sample size is approximately 6,000. I've tried running gbm with various bag.fraction, train.fraction, n.tree, and shrinkage values but still get the same result of all NaNs. Trees and SVM work without any problem on the same data. I even tried converting column 'f' to character, as it was suggested in previous posts, and it didn't work.
Edit: data has no NAs or invalid values. I tried one-hot encoding the 'f' column and still same results.
Upvotes: 4
Views: 1288
Reputation: 1326
In my case, this issue was resolved by converting the dependent variable to character.
gbm_model <- gbm(as.character(a) ~ .
, data = train
, distribution="bernoulli"
, n.trees= 10
, shrinkage=0.001
, bag.fraction = 1
, train.fraction = 0.5
, n.minobsinnode = 3
, cv.folds = 0 # no cross-validation
, keep.data=TRUE
, verbose=TRUE
)
Upvotes: 1