Nicolas123
Nicolas123

Reputation: 131

LightGBM Regression in python categorical values error

I am trying to fit a LightGBM Regressor in python and it gives me an error. Basically, I have a dataset where all the predictors are categorical and my target variable is continuous numeric. Since, all my X variables are categorical I converted them into numeric form using label encoding. After that, I passed to LGBMRegressor my categorical variables in order to the algorithm to handle them accordingly.

# lightgbm for regression
import numpy as np
import lightgbm as lgb
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing


df = pd.read_csv("TrainModelling.csv")
df.drop(df.columns[0],axis=1,inplace=True)    #Remove index column
y = df["Target"]
X = df.drop("Target", axis=1)

le = preprocessing.LabelEncoder()
X = X.apply(le.fit_transform)


X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)


hyper_params = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': ['l2', 'auc'],
    'learning_rate': 0.005,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.7,
    'bagging_freq': 10,
    'verbose': 0,
    "max_depth": 8,
    "num_leaves": 128,  
    "max_bin": 512,
    "num_iterations": 100000,
    "n_estimators": 1000
}

cat_feature_list = np.where(X.dtypes != float)[0]

gbm = lgb.LGBMRegressor(**hyper_params, categorical_feature=cat_feature_list)

gbm.fit(X_train, y_train,
        eval_set=[(X_test, y_test)],
        eval_metric='l1',
        early_stopping_rounds=1000)


The error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Upvotes: 1

Views: 2016

Answers (1)

Mustafa Aydın
Mustafa Aydın

Reputation: 18296

This line is problematic:

cat_feature_list = np.where(X.dtypes != float)[0]

(i wish you shared the whole traceback of the error, it could have saved time..)

X.dtypes != float gives a pandas series of booleans and numpy then tries to evaluate its truthiness and hence the error. To get the name of categorical columns in a list:

cat_feature_list = X.select_dtypes("object").columns.tolist()

Upvotes: 1

Related Questions