user7675045
user7675045

Reputation:

KNeighborsClassifier .fit method returns "ValueError: The truth value of a Series is ambiguous."

I have read numerous Q&A on the subject title; however, I am having a hard time understanding why the ambiguous truth value error is raised when I use the KNeighborsClassifier .fit method. The code and the data that I have is relatively straightforward:

First, I drop all nan values along the row axis from the Opt_Data dataframe and assign the results to the variable titled Training_Data.

Training_Data = Opt_Data.dropna(axis=0,how='any')

Screenshot of the sample dataset

Next, I create two numpy arrays from the Training_Data dataframe. The X_Train array consists of data from Columns 1 - 10 and the Y_Train array consists of data from the Target Column. In the code below, the variable name question is the column name of the Target Column.

X_Train = np.array(Training_Data.loc[:,Training_Data.columns != question])

Y_Train = np.array(Training_Data[question])

After creating my arrays, I set up my KNeighborsClassifier function and pass the results to the variable titled knn. The variable opt_neighbors is an integer value (29). When I use the .fit method on knn, I get the aforementioned value error "The truth value of a Series is ambiguous."

knn = KNeighborsClassifier(n_neighbors=opt_neighbors,weights='distance',metric='hamming')

knn.fit(X_Train,Y_Train)

The shape of the actual X_Train array is (1783,10) and the shape of the actual Y_Train array is (1783,).

I read a blog that stated duplicate rows could be the cause for this error. However, when I used the drop_duplicates method on the Training_Data dataframe and executed the same code, I received the same error message.

I also read that "The or and and python statements require truth-values. For pandas these are considered ambiguous so you should use "bitwise" | (or) or & (and) operations." However, I am not sure how this statement applies as I am not using or or and statements explicitly.

I greatly appreciate any help that anyone can offer me. Thank you!

Upvotes: 1

Views: 618

Answers (1)

user7675045
user7675045

Reputation:

There was a portion of my code that I thought was irrelevant to the problem; however, as it turns out it is what was causing the problem:

In the code below, I am assigning an element from a dataframe (Opt_report) to the variable opt_neighbors. I thought this assignment would create a scalar value, however, it is a pandas Series that consists of an index number (135) and the integer value (19). When I pass this variable to the n_neighbors argument in the KNeighborsClassifier function, it is understood as the Series 135 19.0 as shown in Out [3] line 2, n_neighbors = 135 19.0. My KNeighborsClassifier was poorly executed, which led to the .fit method of the KNeighborsClassifier not working properly.

In  [1]:  opt_neighbors = Opt_report['Optimal_Neighbors']
Out [1]:  135 19.0
          Name: Optimal_Neighbors, dtype: float64

In  [2]: type(opt_neighbors)
Out [2]: pandas.core.series.Series

In  [3]: knn = KNeighborsClassifier(n_neighbors=opt_neighbors,weights='distance',metric='hamming')
         knn
Out [3]: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='hamming', metric_params=None, n_jobs=1,
         n_neighbors=135    19.0
         Name: Optimal_Neighbors, dtype: float64,p=2, weights='distance') 

Updating the code as shown below fixes this problem.

In  [4]: opt_neighbors = int(Opt_report['Optimal_Neighbors'])
Out [4]: 19

Upvotes: 1

Related Questions