Reputation: 33
I'm trying to fit the dataset to a logistic regression model but I'm facing the below error :
ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
I've tried filling the missing values of the Age
column and tried to run model fitting but it still isn't working. note- using python 3.7.1
train = pd.read_csv('titanic_train.csv')
X = train.drop('Survived',axis=1)
y = train['Survived']
from sklearn.model_selection import train_test_split
train['Age'].isnull().values.any()
train['Age'].fillna(train['Age'].mean())
X_train, X_test, y_train,y_test = train_test_split(train.drop('Survived',axis=1),train['Survived'],test_size=0.3,random_state=101)
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
The model should fit and we should be able to get the confusion matrix
Upvotes: 3
Views: 137
Reputation: 19885
The reason is this line:
train['Age'].fillna(train['Age'].mean())
pandas
methods create copies; they do not modify the object they are called on unless you explicitly tell them to. Therefore, you need to do one of the following:
inplace=True
:train['Age'].fillna(train['Age'].mean(), inplace=True)
train['Age'] = train['Age'].fillna(train['Age'].mean())
Note that doing both will not work.
Upvotes: 4