peak
peak

Reputation: 11

How to classify English words according to topics with python?

How to classify English words according to topics with python? Such as THE COUNTRY AND GOVERNMENT: regime, politically, politician, official, democracy......besides, there are other topics: education/family/economy/subjects and so on.

I want to sort out The Economist magazine vocabularies and classify these according to frequency and topic. At present, I have completed the words frequency statistics, the next step is how to classify these words automatically with python?

Upvotes: 0

Views: 926

Answers (2)

Farhood ET
Farhood ET

Reputation: 1551

What you are trying to do is called "Topic Modelling". There are numerous ways to do this but normally training a simple LDA model will be enough. You can also do topic modelling with TF-IDF vectorization by combining it with LSA. This is a good guide comparing the two.

Upvotes: 1

osehyum
osehyum

Reputation: 145

It sounds quite tough to handle it. Also it is not a simple task. If I were you, I consider 2 ways to do what you ask.

  1. Make your own rule for it

    • If you complete counting the words, then you should match those word to topic. There is no free lunch. Make own your rule for classifying category. e.g. Entertainment has many "TV" and "drama" so If some text has it, then we can guess it belongs to Entertainment.
  2. Machine learning.

    • If you can't afford to make rules, let machine do it. But even in this case, you should label the article with your desired class(topics).

    • Unsupervised pre-training(e.g. clustering) can also be used here. but at last, we need supervised data set with topics.

    • You should decide taxonomy of topics.

Welcome to ML world. Hope it helps to get the right starting point.

Upvotes: 0

Related Questions