xfkay
xfkay

Reputation: 11

Do the variables through VectorIndexer gets treated as categorical or numeric in XGBoost?

Lets say I have a variable which is a string variable and I transform this string variable using vectorIndexer. Now when I train a XGBoost model using this variable, will this variable be treated as numeric or categorical?

Basically, I wanted to know whether the splits in trees of the XGBoost model consider this variable as a number or category

Upvotes: 0

Views: 76

Answers (1)

Zakaria Hamane
Zakaria Hamane

Reputation: 67

VectorIndexer: Identifies columns that should be treated as categorical. This is done by using a rule of thumb that says any column with only a few different values is categorical.

In this example:

root
 |-- season: integer (nullable = true)
 |-- yr: integer (nullable = true)
 |-- mnth: integer (nullable = true)
 |-- hr: integer (nullable = true)
 |-- holiday: integer (nullable = true)
 |-- weekday: integer (nullable = true)
 |-- workingday: integer (nullable = true)
 |-- weathersit: integer (nullable = true)
 |-- temp: double (nullable = true)
 |-- atemp: double (nullable = true)
 |-- hum: double (nullable = true)
 |-- windspeed: double (nullable = true)
 |-- cnt: integer (nullable = true)

Year (2 values), season (4 values), holiday (2 values), workingday (2 values), and weather (4 values) are all considered categorical columns.

But to answer your question:

XGBoost models represent all problems as a regression predictive modeling problem that only takes numerical values as input. If your data is in a different form, it must be prepared into the expected format.

Upvotes: -1

Related Questions