Reputation: 3134
Most of the embeddings, publicly available, that I know are done over news articles, which use a different language/words as the one used in user/customer reviews.
Although such embeddings can be used in NLP tasks concerning reviews and user generated content, I think the difference in language has an important role, and as such I would rather use embeddings trained over user generated content, such as product reviews.
I'm looking for a corpus of reviews or comments in English -- although in German and Dutch would also be useful -- to generate embeddings, or alternatively embeddings already trained over such a corpus.
Upvotes: 0
Views: 543
Reputation: 3134
Found two datasets/corpus in English:
https://www.yelp.com/dataset_challenge
https://snap.stanford.edu/data/web-Amazon.html
in German:
http://www.uni-weimar.de/en/media/chairs/webis/corpora/corpus-webis-cls-10/
Upvotes: 1