MJeremy
MJeremy

Reputation: 1250

LightGBM: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I was running lightgbm with categorical features:

X_train, X_test, y_train, y_test = train_test_split(train_X, train_y, test_size=0.3)

train_data = lgb.Dataset(X_train, label=y_train, feature_name=X_train.columns, 
                                  categorical_feature=cat_features)

test_data = lgb.Dataset(X_test, label=y_train, reference=train_data)

param = {'num_trees': 4000, 'objective':'binary', 'metric': 'auc'}
bst = lgb.train(param, train_data, valid_sets=[test_data], early_stopping_rounds=100)

Turns out the Error:

if self.handle is not None and feature_name is not None and feature_name != 'auto':

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I checked the other similar errors on stackoverflow mostly related to numpy, and I then checked documentation and tried to replace my categorical_feature with index like [0, 2, 5, ...](my original was column names of categorical features), still the same error.

I also tried replacing label with the column index, still error.

Anyone could help? Thanks in advance.

Upvotes: 8

Views: 5042

Answers (2)

MJeremy
MJeremy

Reputation: 1250

I also find that drop feature_name works.

train_data = lgb.Dataset(X_train, label=y_train, categorical_feature=cat_features)
test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)

param = {'num_trees': 4000, 'objective':'binary', 'metric': 'auc'}
bst = lgb.train(param, train_data, valid_sets=[test_data], early_stopping_rounds=100)

Upvotes: 0

Mischa Lisovyi
Mischa Lisovyi

Reputation: 3223

I think, there is an issue with the way how you pass feature_name. The constructor expects a list, and oyu pass it pandas.core.indexes.base.Index. The problem is that on such object feature_name != 'auto' condition from the if statement that the error mentions acts element-wise. Thus the or tries to join a bool and numpy.ndarray.

A simple solution would be either to convert to a list (feature_name=X_train.columns.tolist()) or to use feature_name='auto', which will the name extraction from a pd.DataFrame internally

Upvotes: 8

Related Questions