Anatoly
Anatoly

Reputation: 5241

Automatic classification items in the store, is it possible?

I've a database of an items in the store. All them are vegetables, fruits, nuts, berries, etc... I need to categorise them. For example different types of potatoes I should group under single group - potato, tomatoes - tomato, etc...

The most intuitive approach is grouping by using rules, for example if name of an item contains word potato it should grouped under category potatoes, etc...

But there're many categories and I'm looking for automatic approach. For example looking for a most common words in a set of items.

I sure I'm not the first one who is solving such problem and therefore it should be solved already, at least partially. Maybe there're libraries which can help me, neural network.

Thank you in advance.

P.S. most preferable solution is Java based solution, but not must.

Upvotes: 0

Views: 85

Answers (1)

Roman
Roman

Reputation: 13058

From what I understand from your (albeit lacking) example, you can do just the following:

  1. Tokenization (in your case - just splitting to words, removing punctuation)
  2. Stemming (a Porter stemmer will do)
  3. Removing stop words

And you're done. You can use the results for tagging / categorization. There're many questions on SO dealing with these processes, for example: Tokenizer, Stop Word Removal, Stemming in Java

Upvotes: 1

Related Questions