Sourabh
Sourabh

Reputation: 83

Multiclass Text Classification in Python

I am trying to create a Multiclass Text Classifier as explained here. However, my code is breaking at line:

NB_pipeline.fit(X_train, train[category])

Below is the error which I am getting:

File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)

I tried to find out what train[category] returns and I got same error.

1) X_train is a dataframe with one column and contains customer feedback.

2) train is a dataframe with two columns; first column contains customer review(same as X_train) and second column contains one of the 5 categories (Systems Error, Proactive Communication, Staff Behaviour, Website Functionalities, Others).

3) category is one of the above mentioned categories.

Below is the sample train dataframe:

Index           Feedback                                    Category
  0           While making payment got system error.         System error
              Staff behaviour was good at hotel

  1           While making payment got system error.         Staff Behaviour
              Staff behaviour was good at hotel

Upvotes: 1

Views: 309

Answers (1)

user7467529
user7467529

Reputation: 86

This is one of the most over-looked issue.

The reason for this error is that the "column" script is looking for is not available in the dataframe. All the 5 categories you have, should be columns in the input dataframe and rows will take 1/0 if one of the categories is applicable for the feedback/comment. Ideally, Your input dataframe should look like this.

Index           Feedback                                  System error    Staff Behaviour
  0           While making payment got system error.         1                  1
              Staff behaviour was good at hotel

  1           While making payment got system error.         1                  0

  2           Staff behaviour was good at hotel              0                  1

I have used same comment to show how input dataframe should look like.

Upvotes: 2

Related Questions