Murtuza Husain
Murtuza Husain

Reputation: 189

Key error in python machine learning prediction of Y_Test

I am trying to predict the value of column "action_taken" in the Y_test using the Y_train , x_train, and X_train. That is pretty typical.

But I encounter a very strange error , key error stating key Error action_taken.

Apologies, I am a bit new to Machine learning.

import pandas as pd #provide dataframe format
import numpy as np #support all high-level mathematical functions
from sklearn import tree #provides various ML feature such as various classification, regression and clustering algorithms
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt #plotting library for python
import seaborn as sns #provide high level informative statistical and attractive graphics
from sklearn import metrics # for accuracy score
from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO
from IPython.display import Image
import pydotplus
from scipy import misc
%matplotlib inline

Training_logs=pd.read_csv('C:/Users/path/PA_training_data_reformed_2.CSV')
Training_logs=Training_logs.fillna('')
Testing_logs=pd.read_csv('C:/Users/path/PA_testing_data_reformed_2.csv')
Testing_logs= Testing_logs.fillna('')

from sklearn import preprocessing
corrected_data=['deny','drop','reset-client','reset-server','reset-both','block-url','block-ip','random-drop','sinkhole','syncookie-sent','block-continue','continue','block-overide','override-lockout','override']
severity= ['medium','high','critical']
action=['alert','allow','corrected_data']
event_category=['THREAT','CORRELATION']
sub_category=['spyware','url','virus','vulnerability','wildfire','wildfire-virus']
traffic_direction=['NS','EW','SN']

enc=preprocessing.OneHotEncoder()

feature_variable=['severity','event_category', 'action', 'sub_category','traffic_direction']
Training_logs = Training_logs[feature_variable]
Testing_logs = Testing_logs[feature_variable]

from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import LabelEncoder

enc.fit(X_train)

X_test=pd.get_dummies(Testing_logs[feature_variable])

X_dict = Training_logs[feature_variable].to_dict( orient = 'records' ) # try replacing this line 

vect = DictVectorizer(sparse=False)
X_vector = vect.fit_transform(X_dict)

le = LabelEncoder()
y_train = le.fit_transform(Training_logs['action_taken'][:-1])

X_train=X_vector[:-1]
X_test=X_vector[-1:]

enter image description here

Upvotes: 1

Views: 2318

Answers (3)

Mobin Al Hassan
Mobin Al Hassan

Reputation: 1044

This is because you have put wrong key in this line

y_train = le.fit_transform(Training_logs['action_taken'][:-1])

There is no key action_taken in this line..

feature_variable=['severity','event_category', 'action', 'sub_category','traffic_direction']

so Answer is simply change key like this...

y_train = le.fit_transform(Training_logs['action'][:-1])

Upvotes: 1

ASHu2
ASHu2

Reputation: 2047

KeyError comes when you are searching for a key or a term in the DataFrame and its not there.

You can check the columns using : df.columns

So in your case you are using feature_variable as the features/columns in your dataframe for Training_logs and Testing_logs.

So the solution can be either :

  • Add action taken in your feature_variable
    feature_variable=['severity','event_category', 'action', 'sub_category','traffic_direction', 'action_taken']

  • The name of the feature is 'action' instead of 'action_taken'
    Then use y_train = le.fit_transform(Training_logs['action'][:-1])

Upvotes: 0

Davide ND
Davide ND

Reputation: 994

Your Training_logs dataframe only has the columns that you filtered via feature_variable = ['severity','event_category', 'action', 'sub_category','traffic_direction'] Training_logs = Training_logs[feature_variable], so action_taken is not there.

I should probably add action_taken to the filtered variables.

Upvotes: 0

Related Questions