chandlerchow
chandlerchow

Reputation: 11

why the lightgbm training went wrong showing "Wrong size of feature_names"?

when I trained the Dataset by using Lightgbm, Training progress went well as usual untill unexpected error showed up:“LightGBMError: Wrong size of feature_names”. What went wrong?

env:Linux-Red Hat 4.8.5

memory: 500G

python: python 3.6.5

lightgbm: lightgbm 2.2.3

I have been using the same way and coding to train the dataset and all went well before. Except this time the dataset is pretty big(raw data almost 40G,almost 80G loaded as a Pandas dataframe), and its feature names included some Chinese characters after one-hot-encoding.

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import train_test_split

import dic_description as dic
……
List_cat1 = list(df_cat_dummy.columns)
df_l = df['LOST_TAG']
del df['LOST_TAG']
x_train, x_test, y_train, y_test = train_test_split(df, df_l, test_size=0.3, random_state=32)
del df

lgb_train = lgb.Dataset(x_train, y_train, free_raw_data=False, feature_name=list(x_train.columns),categorical_feature=List_cat1)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train, free_raw_data=False, feature_name=list(x_test.columns),categorical_feature=List_cat1)
……
lgb1 = lgb.train(params,
             lgb_train,
             num_boost_round=1000,
             valid_sets=[lgb_eval, lgb_train],
             early_stopping_rounds=50)

And it still trained well and meeted the early stopping this time, and showed the Best iteration.But then the error showed up immediately following:

LightGBMError     Traceback (most recent call last)

…………

LightGBMError: Wrong size of feature_names

Then I searched for a long time on net. I tried not setting the feature_name,the categorical_feature on the lightgbm.Dataset method like other people suggested.But no use.

Upvotes: 0

Views: 1952

Answers (1)

chandlerchow
chandlerchow

Reputation: 11

I got the answer, and I think it did involve the Chinese character problem. I changed all the feature names to short, non-chinese character names and the problem solved.

Upvotes: 1

Related Questions