Adithya
Adithya

Reputation: 33

sklearn error - I've filled the missing values of column but still facing the below error

I'm trying to fit the dataset to a logistic regression model but I'm facing the below error :

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I've tried filling the missing values of the Age column and tried to run model fitting but it still isn't working. note- using python 3.7.1

train = pd.read_csv('titanic_train.csv')

X = train.drop('Survived',axis=1)
y = train['Survived']

from sklearn.model_selection  import train_test_split

train['Age'].isnull().values.any()

train['Age'].fillna(train['Age'].mean())

X_train, X_test, y_train,y_test = train_test_split(train.drop('Survived',axis=1),train['Survived'],test_size=0.3,random_state=101)

from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)

The model should fit and we should be able to get the confusion matrix

Upvotes: 3

Views: 137

Answers (1)

gmds
gmds

Reputation: 19885

The reason is this line:

train['Age'].fillna(train['Age'].mean())

pandas methods create copies; they do not modify the object they are called on unless you explicitly tell them to. Therefore, you need to do one of the following:

  1. Set inplace=True:
train['Age'].fillna(train['Age'].mean(), inplace=True)
  1. Reassign:
train['Age'] = train['Age'].fillna(train['Age'].mean())

Note that doing both will not work.

Upvotes: 4

Related Questions