user1543935
user1543935

Reputation: 25

Mallet dirichelet parameter higher than 1

I've been using MALLET in order to perform my topic modeling(LDA).

I tried to discover 20 topics in a dataset The outcome is the following (the list of keywords is not important for this question):

0   0.05013 list_of_topic_keywords_0
1   0.06444 list_of_topic_keywords_1
2   0.04946 list_of_topic_keywords_2
3   0.14458 list_of_topic_keywords_3
4   0.09248 list_of_topic_keywords_4
5   0.04865 list_of_topic_keywords_5
6   0.0977  list_of_topic_keywords_6
7   0.0653  list_of_topic_keywords_7
8   0.04557 list_of_topic_keywords_8
9   0.07494 list_of_topic_keywords_9
10  0.03577 list_of_topic_keywords_10
11  0.02867 list_of_topic_keywords_11
12  0.04184 list_of_topic_keywords_12
13  0.05251 list_of_topic_keywords_13
14  0.04231 list_of_topic_keywords_14
15  0.03207 list_of_topic_keywords_15
16  0.13064 list_of_topic_keywords_16
17  0.04922 list_of_topic_keywords_17
18  1.0515  list_of_topic_keywords_18
19  0.04922 list_of_topic_keywords_19

I've read that the second number in each row (e.g. 0.05013 in row 0) represents the dirichlet parameter. I thought this number represented the importance of the topic (the presence throughout the documents) and I believed the total should sum op to 1.

However this is not the case! By only looking at topic 18, which has a value of 1.0515.

Could someone explain me what this parameter really represents and why it's higher than 1 for a particular topic?

thanks in advance

Upvotes: 0

Views: 242

Answers (1)

Ben Allison
Ben Allison

Reputation: 7394

Because parameters to a Dirichlet are constrained to be positive reals. They're not proportions. Samples from a Dirichlet are proportions (it has support on the simplex).

First place to check: https://en.wikipedia.org/wiki/Dirichlet_distribution

Size does reflect relative importance. If you normalise a particular parameter by the sum over the Dirichlet parameters, you'll get the expected value of the proportion, but don't make the mistake of thinking this is what the proportion is.

Upvotes: 2

Related Questions