Gee
Gee

Reputation: 85

Clustering text data based on sentiment?

I am scraping reviews off Amazon with the intent to perform sentiment analysis to classify them into positive, negative and neutral. Now the data I would get would be text and unlabeled.

My approach to this problem would be as following:-

1.) Label the data using clustering algorithms like DBScan, HDBScan or KMeans. The number of clusters would obviously be 3.

2.) Train a Classification algorithm on the labelled data.

Now I have never performed clustering on text data but I am familiar with the basics of clustering. So my question is:

  1. Is my approach correct?

  2. Any articles/blogs/tutorials I can follow for text based clustering since I am kinda new to this?

Upvotes: 1

Views: 397

Answers (1)

Meti
Meti

Reputation: 2056

I have never done such an experiment but as far as I know, the most challenging part of this work is transforming the sentences or documents into fixed-length vectors (mapping into semantic space). I highly suggest using a sentiment analysis pipeline from huggingface library for embedding the sentences (in this way you might exploit some supervision). There are other options as well:

  1. Using sentence-transformers library. (straightforward and still good)
  2. Using BoW. (simplest way but hard to get what you want)
  3. Using TF-IDF (still simple but may simply do the work)

After you reach this point (every review ==> fixed-length vector) you can exploit whatever you want to cluster them and look after the results.

Upvotes: 1

Related Questions