Pulkit Verma
Pulkit Verma

Reputation: 21

Replacing null values in Pandas data frame with a series

I created a function to replace missing values with knn in Python, following is my function:

def missing_variables_knn(x):
    test = data[data[x].isnull()]
    train = data[data[x].isnull()==False] 
    X_train = train.loc[:, ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term']]
    Y_train = train[x]
    X_test = test.loc[:, ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term']]
    knn = KNeighborsClassifier(n_neighbors=3)
    knn.fit(X_train, Y_train)
    pred = knn.predict(X_test)
    pred = pd.Series(pred)
    data[x].fillna(pred)

When I used missing_variables_knn('Gender'), I got an error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 2

Views: 241

Answers (1)

Joaquim De la Cruz
Joaquim De la Cruz

Reputation: 55

The library needs a value that is always true and always. With your function, you do not guarantee that you will always return true. That's why pandas interpret it as ambiguous.

What you should do is use other functions like .filter (). There is a related post here: https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any- o / 36922103

The safest thing is that the error is here: train = data[data[x].isnull()==False]

Upvotes: 1

Related Questions