kingmakerking
kingmakerking

Reputation: 2089

Topic Modling: How to use the LDA in C for example data?

I want to try the LDA -c code by Blie .et.al. as it is in this link.

I have compiled the code, and when I run ./lda in my terminal, the following result is displayed.

usage : lda est [initial alpha] [k] [settings] [data] [random/seeded/manual=filename/*] [directory]
        lda inf [settings] [model] [data] [name]

Which means that, it has been complied correctly.

However, in spite reading the README.txt file there, I am not being able to succesfully run the LDA code. Either it says Segmentation fault (core dumped) or killed.

What am I missing? How to use it on the example data they have given?

I have read the stack overflow answer to the question asked here, but it was not useful as I dont know the default values.

P.S: I am a beginer.

Upvotes: 1

Views: 799

Answers (1)

Ray
Ray

Reputation: 1993

Are you using ap.txt instead of ap.dat by any chance? lda-c doesn't take raw sentences or marked up data as input; it takes a sequence of bag of words information for each document. When ap.dat has a line like 186 0:1 6144:1 3586:2 ..., it means that the corresponding document has 186 distinct words, word 0 appears once, word 6144 appears once, word 3586 appears twice, and so on.

This command works for me (using Blei's original code):

./lda est 0.1 10 settings.txt ap.dat random modeldir

(Feel free to tweak the initial alpha (0.1) and number of topics (10) as you wish.)

Upvotes: 2

Related Questions