Reputation: 1966
I wanted to implement KNN in python. Till now I have loaded my data into Pandas DataFrame.
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
train_df = pd.read_csv("creditlimit_train.csv") # train dataset
train_df.head()
The output of head is
SNo Salary LoanAmt Level
101 100000 10000 Low Level
102 108500 11176 Low Level
103 125500 13303 Low Level
104 134000 14606 Low Level
105 142500 15960 Low Level
test_df = pd.read_csv("creditlimit_test.csv")
test_df.head()
The output of head is
SNo Salary LoanAmt Level
101 100000 10000 Low Level
102 108500 11176 Low Level
103 125500 13303 Low Level
104 134000 14606 Low Level
105 142500 15960 Low Level
neigh = KNeighborsClassifier(n_neighbors=5,algorithm='auto')
predictor_features = ['Salary','LoanAmt']
dependent_features = ['Level']
neigh.fit(train_df[predictor_features],train_df[dependent_features])
How do I use the fit function to use salary,loanAmt as predictor to predict the levels for my test_df?
Update 1: The levels are 3 : Low, Medium and High
Upvotes: 1
Views: 1621
Reputation: 1723
You can convert your DataFrame to a numpy array and pass as input
# convert class labels in numerical data, assuming you have two classes
df['Level'].replace(['Low Level'],0)
df['Level'].replace(['High Level'],1)
# extra data and class labels
data = df[['Salary','LoanAmt']]
target = df['Level']
# convert df to numpy arrays
data = data.values
target = target.values
# you would ideally want to do a test train split.
#Train the model on training data and test on the test data for accuracy
#pass in fit function
neigh = KNeighborsClassifier(n_neighbors=5,algorithm='auto')
neigh.fit(data,target) ## how to passs the parameters here?
Some useful links:
Convert pandas dataframe to numpy array, preserving index
Replacing few values in a pandas dataframe column with another value
Selecting columns in a pandas dataframe
Upvotes: 1