stats_geek
stats_geek

Reputation: 1

Having Trouble Converting Categorical variable to float in CatBoost

I have gone through other answers on SO and github. There is some problem with conversion of categorical variables to float and I can not think of a solution.

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
import catboost as cb

data = pd.read_csv("flights.csv")
data = data[["MONTH","DAY","DAY_OF_WEEK","AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT",
                 "ORIGIN_AIRPORT","AIR_TIME", "DEPARTURE_TIME","DISTANCE", "DEPARTURE_DELAY","ARRIVAL_DELAY"]]
data.dropna(inplace=True)

print(data)

# cat_vars = [var for var in data.columns if data[var].dtype == "O"]
categorical_features_indices = np.where(data.dtypes != np.float)[0]


cols = ["AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT","ORIGIN_AIRPORT"]
 
train, test, y_train, y_test = train_test_split(data.drop(["ARRIVAL_DELAY"], axis=1), data["ARRIVAL_DELAY"],
                                                random_state=10, test_size=0.25)
model = cb.CatBoostRegressor(loss_function='RMSE')
from sklearn.model_selection import GridSearchCV
param_dist = {"max_depth": [10,15],
              "n_estimators": [50, 60],
              "learning_rate": [0.1, 0.15],}
search = GridSearchCV(estimator=model, param_grid = param_dist, cv = 3).fit(train, y_train)
print("\n The best parameters across ALL searched params:\n",search.best_params_)

The error pops up after running GridSearchCV and is:

Cannot convert 'b'UA'' to float

My dataset looks like this enter image description here. The last column is DEPARTURE_DELAY and contatins integers. The target variable also contains integers. It would be great if anyone can help me how to solve this problem.

Upvotes: 0

Views: 308

Answers (0)

Related Questions