Reputation: 43
I have a lot of "Search Keywords" for every product in the dataset. I try to cluster products according to their "Search Keywords".
What I'm looking to do is cluster these keywords into clusters of "similar meaning", and create a hierarchy of the clusters (structured in order of summed total number of searches per cluster).
An example cluster - "women's clothing" - would ideally contain keywords along these lines: women's clothing, 1000 ladies wear, 300 women's clothes, 50 ladies' clothing, 6 women wear, 2.
I'm a beginner in NLP. Do you have any suggestions of NLP techniques for this task? Any help will be highly appreciated :-)
Upvotes: 3
Views: 1583
Reputation: 2694
I suggest to use some pretrained word vectors,fastText for example, so you don't have to worry with training and training data. What you would need to do:
women's clothing
-> ["women's", "clothing"]
. see here ["women's", "clothing"]
-> ["woman", "clothing"]
see herevec1 = model.get_word_vector("woman")
avg= (vec1 + vec2)/2
These average vectors should represent your label. The average vectors of woman
and clothing
should be in the same region as the average of woman
and wear
. on the other hand the average vector of man
and clothing
should be in a different region in the vector space, so your preferred clustering algorithm shall catch it.Upvotes: 5