user2728024
user2728024

Reputation: 1546

feature_names must be unique - Xgboost

I am running the xgboost model for a very sparse matrix.

I am getting this error. ValueError: feature_names must be unique

How can I deal with this?

This is my code.

  yprob = bst.predict(xgb.DMatrix(test_df))[:,1]

Upvotes: 14

Views: 21623

Answers (4)

Ayan Mitra
Ayan Mitra

Reputation: 525

I converted to them to np.array(df). My problem was solved

Upvotes: 0

Arjan Groen
Arjan Groen

Reputation: 624

Assuming the problem is indeed that columns are duplicated, the following line should solve your problem:

test_df = test_df.loc[:,~test_df.columns.duplicated()]

Source: python pandas remove duplicate columns

This line should identify which columns are duplicated:

duplicate_columns = test_df.columns[test_df.columns.duplicated()]

Upvotes: 8

Akshay
Akshay

Reputation: 94

One way around this can be to use column names that are unique while preparing the data and then it should work out.

Upvotes: 0

andrew_reece
andrew_reece

Reputation: 21264

According the the xgboost source code documentation, this error only occurs in one place - in a DMatrix internal function. Here's the source code excerpt:

if len(feature_names) != len(set(feature_names)):
    raise ValueError('feature_names must be unique')

So, the error text is pretty literal here; your test_df has at least one duplicate feature/column name.

You've tagged pandas on this post; that suggests test_df is a Pandas DataFrame. In this case, DMatrix literally runs df.columns to extract feature_names. Check your test_df for repeat column names, remove or rename them, and then try DMatrix() again.

Upvotes: 13

Related Questions