Reputation: 83
I am trying to create a Multiclass Text Classifier as explained here. However, my code is breaking at line:
NB_pipeline.fit(X_train, train[category])
Below is the error which I am getting:
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)
I tried to find out what train[category]
returns and I got same error.
1) X_train
is a dataframe
with one column and contains customer feedback.
2) train
is a dataframe
with two columns; first column contains customer review(same as X_train
) and second column contains one of the 5 categories (Systems Error, Proactive Communication, Staff Behaviour, Website Functionalities, Others
).
3) category
is one of the above mentioned categories.
Below is the sample train dataframe
:
Index Feedback Category
0 While making payment got system error. System error
Staff behaviour was good at hotel
1 While making payment got system error. Staff Behaviour
Staff behaviour was good at hotel
Upvotes: 1
Views: 309
Reputation: 86
This is one of the most over-looked issue.
The reason for this error is that the "column" script is looking for is not available in the dataframe. All the 5 categories you have, should be columns in the input dataframe and rows will take 1/0 if one of the categories is applicable for the feedback/comment. Ideally, Your input dataframe should look like this.
Index Feedback System error Staff Behaviour
0 While making payment got system error. 1 1
Staff behaviour was good at hotel
1 While making payment got system error. 1 0
2 Staff behaviour was good at hotel 0 1
I have used same comment to show how input dataframe should look like.
Upvotes: 2