Lourens
Lourens

Reputation: 104

Using Text Sentiment as feature in machine learning model?

I am researching what features I'll have for my machine learning model, with the data I have. My data contains a lot of textdata, so I was wondering how to extract valuable features from it. Contrary to my previous belief, this often consists of representation with Bag-of-words, or something like word2vec: (http://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction)

Because my understanding of the subject is limited, I dont understand why I can't analyze the text first to get numeric values. (for example: textBlob.sentiment =https://textblob.readthedocs.io/en/dev/, google Clouds Natural Language =https://cloud.google.com/natural-language/)

Are there problems with this, or could I use these values as features for my machine learning model?

Thanks in advance for all the help!

Upvotes: 2

Views: 205

Answers (1)

Muhammed Hasan Celik
Muhammed Hasan Celik

Reputation: 682

Of course, you can convert text input single number with sentiment analysis then use this number as a feature in your machine learning model. Nothing wrong with this approach.

The question is what kind of information you want to extract from text data. Because sentiment analysis convert text input to a number between -1 to 1 and the number represents how positive or negative the text is. For example, you may want sentiment information of the customers' comments about a restaurant to measure their satisfaction. In this case, it is fine to use sentiment analysis to preprocess text data.

But again, sentiment analysis is only given an idea about how positive or negative text is. You may want to cluster text data and sentiment information is not useful in this case since it does not provide any information about the similarity of texts. Thus, other approaches such as word2vec or bag-of-words will be used for the representation of text data in those tasks. Because those algorithms provide vector representation of the text instance of a single number.

In conclusion, the approach depends on what kind of information you need to extract from data for your specific task.

Upvotes: 1

Related Questions