Having Trouble Converting Categorical variable to float in CatBoost

Question

I have gone through other answers on SO and github. There is some problem with conversion of categorical variables to float and I can not think of a solution.

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
import catboost as cb

data = pd.read_csv("flights.csv")
data = data[["MONTH","DAY","DAY_OF_WEEK","AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT",
                 "ORIGIN_AIRPORT","AIR_TIME", "DEPARTURE_TIME","DISTANCE", "DEPARTURE_DELAY","ARRIVAL_DELAY"]]
data.dropna(inplace=True)

print(data)

# cat_vars = [var for var in data.columns if data[var].dtype == "O"]
categorical_features_indices = np.where(data.dtypes != np.float)[0]


cols = ["AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT","ORIGIN_AIRPORT"]
 
train, test, y_train, y_test = train_test_split(data.drop(["ARRIVAL_DELAY"], axis=1), data["ARRIVAL_DELAY"],
                                                random_state=10, test_size=0.25)
model = cb.CatBoostRegressor(loss_function='RMSE')
from sklearn.model_selection import GridSearchCV
param_dist = {"max_depth": [10,15],
              "n_estimators": [50, 60],
              "learning_rate": [0.1, 0.15],}
search = GridSearchCV(estimator=model, param_grid = param_dist, cv = 3).fit(train, y_train)
print("
 The best parameters across ALL searched params:
",search.best_params_)

The error pops up after running GridSearchCV and is:

Cannot convert 'b'UA'' to float

My dataset looks like this enter image description here. The last column is DEPARTURE_DELAY and contatins integers. The target variable also contains integers. It would be great if anyone can help me how to solve this problem.

Having Trouble Converting Categorical variable to float in CatBoost

Answers (0)

Related Questions