Reputation: 1546
I am running the xgboost model for a very sparse matrix.
I am getting this error. ValueError: feature_names must be unique
How can I deal with this?
This is my code.
yprob = bst.predict(xgb.DMatrix(test_df))[:,1]
Upvotes: 14
Views: 21623
Reputation: 624
Assuming the problem is indeed that columns are duplicated, the following line should solve your problem:
test_df = test_df.loc[:,~test_df.columns.duplicated()]
Source: python pandas remove duplicate columns
This line should identify which columns are duplicated:
duplicate_columns = test_df.columns[test_df.columns.duplicated()]
Upvotes: 8
Reputation: 94
One way around this can be to use column names that are unique while preparing the data and then it should work out.
Upvotes: 0
Reputation: 21264
According the the xgboost
source code documentation, this error only occurs in one place - in a DMatrix
internal function. Here's the source code excerpt:
if len(feature_names) != len(set(feature_names)):
raise ValueError('feature_names must be unique')
So, the error text is pretty literal here; your test_df
has at least one duplicate feature/column name.
You've tagged pandas
on this post; that suggests test_df
is a Pandas DataFrame
. In this case, DMatrix
literally runs df.columns
to extract feature_names
. Check your test_df
for repeat column names, remove or rename them, and then try DMatrix()
again.
Upvotes: 13