Reputation: 1332
My data looks something like this:
structure(list(response = c("NoResponse", "NoResponse", "Response",
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "Response",
"NoResponse", "NoResponse"), cancer_type = structure(c(8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), levels = c("Adenoid cystic carcinoma",
"Breast", "Cholangiocarcinoma", "Colorectal", "Germ Cell", "Head and neck squamous cell carcinoma",
"Lymphoma", "Melanoma", "NSCLC", "Oesophageal", "Pancreatic",
"Renal Clear Cell Carcinoma", "Sarcoma", "Stomach adenocarcinoma",
"sweat gland carcinoma", "Thymic Carcinoma", "Urothelial Bladder Carcinoma",
"Uterine corpus endometrial carcinoma", "Uveal melanoma"), class = "factor"),
Treatment = structure(c(6L, 5L, 6L, 6L, 6L, 5L, 6L, 5L, 6L,
5L), levels = c("anti-CTLA4", "anti-CTLA4 + anti-PD1", "anti-CTLA4 + anti-PDL1",
"anti-CTLA4 + anti-PDL1 + Alimta + Paraplatin", "anti-PD1",
"anti-PD1 (after anti-CTLA4)", "anti-PD1 + anti-CTLA4", "anti-PD1 + anti-IDO1",
"anti-PD1 + anti-KIR", "anti-PD1 + anti-LAG3", "anti-PD1 + anti+CTLA4",
"anti-PD1 + Herceptin", "anti-PD1 + NVB + Gemzar", "anti-PDL1",
"anti-PDL1 + anti-VEGF-A", "anti-PDL1 + Axitinib", "anti-PDL1 + PF-04518600",
"anti-PDL1 + SMAC"), class = "factor"), B.cells = c(0.0928073704220432,
0.0452143935493372, 1.30047878079526, 0.184967800962064,
0.0328904854435036, 0.0416414264467815, 0.00647774047514386,
0.653999365837062, 0.0506147836795817, 0.225440581016202),
CD4..memory.T.cells = c(0.04679171356058, 0, 0.24081994997988,
0, 0.0084070550945875, 0, 0, 0.0704387567897827, 0.0162007196539715,
0.0538907493278964), CD4..naive.T.cells = c(0, 0, 0.222121262122827,
0, 0, 0, 0, 0.0337776019379054, 0, 0), CD4..Tem = c(0.143576212061698,
0, 0.152923936572005, 0.191565445100194, 0.104205104847475,
0, 0, 0.117793698582659, 0.0956922304673, 0.120086195256724
), CD8..T.cells = c(0.0221692147248866, 0, 0.261136892323247,
0.0581410305553568, 0.021201558979391, 0.0344057714088149,
0, 0.0791463110435499, 0.00786274616219145, 0.0188003251730739
), CD8..Tcm = c(0.148092927249335, 0, 0.430297989210553,
0.216019483428908, 0.0507286063890634, 0.031306594576336,
0, 0.196851960745196, 0.111834265334993, 0.120204322607267
), Class.switched.memory.B.cells = c(0.0288426949470172,
0.0183792109912145, 0.36043436228306, 0.0322788399661325,
0, 0, 0.0141223906735437, 0.151803874587016, 0.0238553460299785,
0.105771253258905)), row.names = c("Pt1", "Pt10", "Pt101",
"Pt103", "Pt106", "Pt11", "Pt17", "Pt18", "Pt2", "Pt24"), class = "data.frame")
As you see, I have the response
variable which is the target variable (binary). All other variables are predictive. All predictive variables are numerical other than treatment
and cancer_type
which are characters.
I'm trying to train a GBM model. But if I'm not mistaken, it needs all variables to be numeric. How do I do that? The treatment
feature has so many different values, there are many different treatments used.
When I try fitting the model without changing the features it produces this error:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
factor cancer_type has new levels Oesophageal
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Upvotes: 1
Views: 48