Mariyan
Mariyan

Reputation: 125

How to fix X does not have valid feature names, but IsolationForest was fitted with feature names warnings.warn(

Here is my code:

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.ensemble import IsolationForest

data = pd.read_csv('marks1.csv', encoding='latin-1',
                   on_bad_lines='skip', index_col=0, header=0
                   )

random_state = np.random.RandomState(42)

model = IsolationForest(n_estimators=100, max_samples='auto', contamination=float(0.2)
                        , random_state=random_state)

model.fit(data[['Mark']])

random_state = np.random.RandomState(42)

data['scores'] = model.decision_function(data[['Mark']])

data['anomaly_score'] = model.predict(data[['Mark']])

data[data['anomaly_score'] == -1].head()

Error:

C:\Program Files\Python39\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but IsolationForest was fitted with feature names warnings.warn(

Upvotes: 10

Views: 17515

Answers (1)

Joe Dattoli
Joe Dattoli

Reputation: 178

It depends on the version of sklearn you are using. In versions past 1.0, models have a feature_names attribute when trained with dataframes that integrates the column names. There was a bug in this version that threw an error when training with dataframes. https://github.com/scikit-learn/scikit-learn/issues/21577

I'm not up to date with the new best practices for this yet, so I cannot say definitively how it should be set up. But I just side stepped the issue in my code for now. To get around this, I convert my dataframes to a numpy array before training

df.to_numpy()

Upvotes: 16

Related Questions