Ash Upadhyay
Ash Upadhyay

Reputation: 1966

How to implement KNN in python?

I wanted to implement KNN in python. Till now I have loaded my data into Pandas DataFrame.

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
train_df = pd.read_csv("creditlimit_train.csv") # train dataset
train_df.head()

The output of head is

SNo      Salary      LoanAmt   Level
101      100000      10000     Low Level
102      108500      11176     Low Level
103      125500      13303     Low Level
104      134000      14606     Low Level
105      142500      15960     Low Level


test_df = pd.read_csv("creditlimit_test.csv")
test_df.head()

The output of head is

SNo      Salary      LoanAmt   Level
101      100000      10000     Low Level
102      108500      11176     Low Level
103      125500      13303     Low Level
104      134000      14606     Low Level
105      142500      15960     Low Level

neigh = KNeighborsClassifier(n_neighbors=5,algorithm='auto')
predictor_features = ['Salary','LoanAmt']
dependent_features = ['Level']
neigh.fit(train_df[predictor_features],train_df[dependent_features])

How do I use the fit function to use salary,loanAmt as predictor to predict the levels for my test_df?

Update 1: The levels are 3 : Low, Medium and High

Upvotes: 1

Views: 1621

Answers (1)

Yuvraj Jaiswal
Yuvraj Jaiswal

Reputation: 1723

You can convert your DataFrame to a numpy array and pass as input

# convert class labels in numerical data, assuming you have two classes
df['Level'].replace(['Low Level'],0)
df['Level'].replace(['High Level'],1)

# extra data and class labels
data = df[['Salary','LoanAmt']]
target = df['Level']

# convert df to numpy arrays
data = data.values
target =  target.values

# you would ideally want to do a test train split.
#Train the model on training data and test on the test data for accuracy

#pass in fit function
neigh = KNeighborsClassifier(n_neighbors=5,algorithm='auto')
neigh.fit(data,target) ## how to passs the parameters here?

Some useful links:

Convert pandas dataframe to numpy array, preserving index

Replacing few values in a pandas dataframe column with another value

Selecting columns in a pandas dataframe

Upvotes: 1

Related Questions