Reputation: 3305
have a data set like as bellow in CSV format.
FileName,Topic,Tag,Frequency
File-1,Topic -1,Tag-1,10
File-2,Topic -2,Tag-2,10
File-3,Topic -3,Tag-2,10
File-4,Topic -4,Tag-4,10
File-5,Topic -1,Tag-5,10
File-6,Topic -3,Tag-1,10
File-7,Topic -1,Tag-1,10
I need to find a correlation between the tags using mahout LDA(Latent Dirichlet allocation) algorithm. Can anybody please help me to find how to do that using Apache Mahout.
I am also confused that in exactly what input format mahout wants ?
It will be helpful if somebody please share some good stuff for mahout beginner
Upvotes: 1
Views: 1164
Reputation: 78
I might be late in answering. But, Mahout no longer supports LDA for versions above 0.6 . One has to use Cvb instead of lda to accomplish the task of running topic models.
The following links can help You:
https://mahout.apache.org/users/clustering/lda-commandline.html https://mahout.apache.org/users/clustering/latent-dirichlet-allocation.html
Upvotes: 1