Reputation: 131
I am trying to fit a LightGBM Regressor in python and it gives me an error. Basically, I have a dataset where all the predictors are categorical and my target variable is continuous numeric. Since, all my X variables are categorical I converted them into numeric form using label encoding. After that, I passed to LGBMRegressor my categorical variables in order to the algorithm to handle them accordingly.
# lightgbm for regression
import numpy as np
import lightgbm as lgb
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
df = pd.read_csv("TrainModelling.csv")
df.drop(df.columns[0],axis=1,inplace=True) #Remove index column
y = df["Target"]
X = df.drop("Target", axis=1)
le = preprocessing.LabelEncoder()
X = X.apply(le.fit_transform)
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)
hyper_params = {
'task': 'train',
'boosting_type': 'gbdt',
'objective': 'regression',
'metric': ['l2', 'auc'],
'learning_rate': 0.005,
'feature_fraction': 0.9,
'bagging_fraction': 0.7,
'bagging_freq': 10,
'verbose': 0,
"max_depth": 8,
"num_leaves": 128,
"max_bin": 512,
"num_iterations": 100000,
"n_estimators": 1000
}
cat_feature_list = np.where(X.dtypes != float)[0]
gbm = lgb.LGBMRegressor(**hyper_params, categorical_feature=cat_feature_list)
gbm.fit(X_train, y_train,
eval_set=[(X_test, y_test)],
eval_metric='l1',
early_stopping_rounds=1000)
The error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Upvotes: 1
Views: 2016
Reputation: 18296
This line is problematic:
cat_feature_list = np.where(X.dtypes != float)[0]
(i wish you shared the whole traceback of the error, it could have saved time..)
X.dtypes != float
gives a pandas series of booleans and numpy
then tries to evaluate its truthiness and hence the error. To get the name of categorical columns in a list:
cat_feature_list = X.select_dtypes("object").columns.tolist()
Upvotes: 1