Reputation: 1
I have gone through other answers on SO and github. There is some problem with conversion of categorical variables to float and I can not think of a solution.
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
import catboost as cb
data = pd.read_csv("flights.csv")
data = data[["MONTH","DAY","DAY_OF_WEEK","AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT",
"ORIGIN_AIRPORT","AIR_TIME", "DEPARTURE_TIME","DISTANCE", "DEPARTURE_DELAY","ARRIVAL_DELAY"]]
data.dropna(inplace=True)
print(data)
# cat_vars = [var for var in data.columns if data[var].dtype == "O"]
categorical_features_indices = np.where(data.dtypes != np.float)[0]
cols = ["AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT","ORIGIN_AIRPORT"]
train, test, y_train, y_test = train_test_split(data.drop(["ARRIVAL_DELAY"], axis=1), data["ARRIVAL_DELAY"],
random_state=10, test_size=0.25)
model = cb.CatBoostRegressor(loss_function='RMSE')
from sklearn.model_selection import GridSearchCV
param_dist = {"max_depth": [10,15],
"n_estimators": [50, 60],
"learning_rate": [0.1, 0.15],}
search = GridSearchCV(estimator=model, param_grid = param_dist, cv = 3).fit(train, y_train)
print("\n The best parameters across ALL searched params:\n",search.best_params_)
The error pops up after running GridSearchCV and is:
Cannot convert 'b'UA'' to float
My dataset looks like this enter image description here. The last column is DEPARTURE_DELAY and contatins integers. The target variable also contains integers. It would be great if anyone can help me how to solve this problem.
Upvotes: 0
Views: 308