Taras
Taras

Reputation: 73

Predicting product's category by search term

Problem: User performs product's search using search term, we should define most related category(categories is descending order) to that search term.

Given: Products set, around 50000(Could be ten times more) products. Products contains title, description, and list of categories it belongs to.

Model:

Pre-processing Perform stemming and remove stopwords from product's title and description. Put all unique stemmed words in WORDS list of size N. Put all categories to CATEGORIES list of size M.

Fitting Use neural network which has N input neurons and M outputs.

Training For product which has words w1, w3, w4, w6 input will be x=[1 0 1 1 0 1 ...] in which elements which index corresponds to thouse words index in WORDS will be set to 1. If product belongs to categories c1, c3, c25 it corresponds to y =[1 0 1 ... 1(25-th position)...] Predicting step. As input put user search term stemmed tokens that should as output give us prediction of most related category.

Is this model correct way for solving such a problem? What are the recommendation for hidden NN layers configuration. Any advice will be helpful, I'm completely new to Machine Learning.

Thank you!

Upvotes: 1

Views: 447

Answers (1)

Leonardo Braga
Leonardo Braga

Reputation: 61

I think that's the correct way of solving the problem, since you're dealing with a multi-label classification problem. That is, a sample can belong to several classes simultaneously, or to a single class, or to none of the classes (categories).

This is a good example on Python: multi-label classification. You can get more details here.

As for hidden layers configuration, the first approach is to use cross-validation to test the accuracy on the test set. But if you want to go further, please read this.

Upvotes: 1

Related Questions