Reputation: 2089
I want to try the LDA -c code by Blie .et.al. as it is in this link.
I have compiled the code, and when I run ./lda in my terminal, the following result is displayed.
usage : lda est [initial alpha] [k] [settings] [data] [random/seeded/manual=filename/*] [directory]
lda inf [settings] [model] [data] [name]
Which means that, it has been complied correctly.
However, in spite reading the README.txt file there, I am not being able to succesfully run the LDA code. Either it says Segmentation fault (core dumped) or killed.
What am I missing? How to use it on the example data they have given?
I have read the stack overflow answer to the question asked here, but it was not useful as I dont know the default values.
P.S: I am a beginer.
Upvotes: 1
Views: 799
Reputation: 1993
Are you using ap.txt
instead of ap.dat
by any chance? lda-c doesn't take raw sentences or marked up data as input; it takes a sequence of bag of words information for each document. When ap.dat has a line like
186 0:1 6144:1 3586:2 ...
, it means that the corresponding document has 186 distinct words, word 0 appears once, word 6144 appears once, word 3586 appears twice, and so on.
This command works for me (using Blei's original code):
./lda est 0.1 10 settings.txt ap.dat random modeldir
(Feel free to tweak the initial alpha (0.1) and number of topics (10) as you wish.)
Upvotes: 2