Reputation: 11
Lets say I have a variable which is a string variable and I transform this string variable using vectorIndexer. Now when I train a XGBoost model using this variable, will this variable be treated as numeric or categorical?
Basically, I wanted to know whether the splits in trees of the XGBoost model consider this variable as a number or category
Upvotes: 0
Views: 76
Reputation: 67
VectorIndexer: Identifies columns that should be treated as categorical. This is done by using a rule of thumb that says any column with only a few different values is categorical.
In this example:
root
|-- season: integer (nullable = true)
|-- yr: integer (nullable = true)
|-- mnth: integer (nullable = true)
|-- hr: integer (nullable = true)
|-- holiday: integer (nullable = true)
|-- weekday: integer (nullable = true)
|-- workingday: integer (nullable = true)
|-- weathersit: integer (nullable = true)
|-- temp: double (nullable = true)
|-- atemp: double (nullable = true)
|-- hum: double (nullable = true)
|-- windspeed: double (nullable = true)
|-- cnt: integer (nullable = true)
Year (2 values), season (4 values), holiday (2 values), workingday (2 values), and weather (4 values) are all considered categorical columns.
But to answer your question:
XGBoost models represent all problems as a regression predictive modeling problem that only takes numerical values as input. If your data is in a different form, it must be prepared into the expected format.
Upvotes: -1