Reputation: 11
How to classify English words according to topics with python? Such as THE COUNTRY AND GOVERNMENT: regime, politically, politician, official, democracy......besides, there are other topics: education/family/economy/subjects and so on.
I want to sort out The Economist magazine vocabularies and classify these according to frequency and topic. At present, I have completed the words frequency statistics, the next step is how to classify these words automatically with python?
Upvotes: 0
Views: 926
Reputation: 1551
What you are trying to do is called "Topic Modelling". There are numerous ways to do this but normally training a simple LDA model will be enough. You can also do topic modelling with TF-IDF vectorization by combining it with LSA. This is a good guide comparing the two.
Upvotes: 1
Reputation: 145
It sounds quite tough to handle it. Also it is not a simple task. If I were you, I consider 2 ways to do what you ask.
Make your own rule for it
Machine learning.
If you can't afford to make rules, let machine do it. But even in this case, you should label the article with your desired class(topics).
Unsupervised pre-training(e.g. clustering) can also be used here. but at last, we need supervised data set with topics.
You should decide taxonomy of topics.
Welcome to ML world. Hope it helps to get the right starting point.
Upvotes: 0