Reputation: 21
I created a function to replace missing values with knn in Python, following is my function:
def missing_variables_knn(x):
test = data[data[x].isnull()]
train = data[data[x].isnull()==False]
X_train = train.loc[:, ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term']]
Y_train = train[x]
X_test = test.loc[:, ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term']]
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, Y_train)
pred = knn.predict(X_test)
pred = pd.Series(pred)
data[x].fillna(pred)
When I used missing_variables_knn('Gender')
, I got an error:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 2
Views: 241
Reputation: 55
The library needs a value that is always true and always. With your function, you do not guarantee that you will always return true. That's why pandas interpret it as ambiguous.
What you should do is use other functions like .filter (). There is a related post here: https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any- o / 36922103
The safest thing is that the error is here: train = data[data[x].isnull()==False]
Upvotes: 1