Reputation: 457
In Hierarchical Dirichlet Process, the author gives an interpretation of HDP using Chinese Restaurant Franchise. It said that each restaurant has many tables and different tables may share a common dish in one restaurant. The dish here we can regarded as a topic in the document, then how to understand tables in every document? I think different tables should order different dishes, if two tables with the same dish, then why not merge them into one? Thanks a lot.
Upvotes: 1
Views: 249
Reputation: 183
I am also trying to explain this to myself, so I will give it a go.
It helps if we start with the graphical representation of the stick-breaking construction of the vanilla Dirichlet Process Mixture Model:
You can think of this as the generative model of your data. The weights $\pi$ are generated via stick-breaking with parameter $\alpha_0$ and define the probability of a given mixture component. For each data point i we sample $z_i ~ Cat(\pi)$ which defines the cluster membership of that point. So z_i are index variables (as explained in the answe by Vadim Smolyakov) which define which of the $theta_k$ will be used to generate pont $x_i$. If we use the chinese restaurant metaphor, the tables of the restaurant represent which $theta_k$ is associated with a given cluster and the $\pi_k$ represent the probabilities of sitting the next customer on that table (adding the next data-point to that cluster).
In a hierarchical dirichlet process, we have two vanilla DPs happening at different levels:
So now we have an infinite number of restaurants (where each restaurant represents a group within your data) and each restaurant has an infinite number of tables which can serve any of an infinite number of dishes but the dish menu is shared between all restaurants.
Since we have two DPs we also need two sets of index variables like $z_i$ in the vanilla example. One for the table on which a customers was seated in a given restaurant $t_ji$ and one for the dish that has been served on that table $k_jt$.
This leads to the situation where you can have multiple tables in the same restaurant serving the same dish, which from the perspective of the table-level DP has no special meaning since that DP is concerned with the total counts of customers on tables to update the predictive posteriors.
When dealing with the dish-level DP this information is indeed summed out, since in the dish-level DP the probability of the next dish being any of the known $K$ dishes or a new one depends on how many times a given dish has been served in a given restaurant (irrespective of on which table).
This is a very nice resource "Hierarchical Dirichlet Process: A Gentle Introduction, Xiaodong Yu, University of Maryland, College Park, September 13, 2009" (with some typos) that explains what I said a lot better than I just did. The second figure is from there.
https://yuxiaodong.wordpress.com/wp-content/uploads/2009/09/hdp_introduction.pdf
Upvotes: 0
Reputation: 1197
In the Chinese Restaurant Franchise (CRF), each document is a restaurant, each word is a customer, and cluster parameters are dishes served to tables from a global menu. A customer enters a restaurant and sits at a table with probability proportional to the number of customers already at a table, or sits at a new table with probability alpha. New tables are then assigned a particular dish with probability proportional to the number of tables already serving that dish, or a new dish with probability gamma.
Thus, for every customer we have an index that maps the customer to the table and for every table we have an index that maps the table to one of the dishes. A Gibbs sampling algorithm, first samples tables associated with data, and then samples dishes associated with each table. For more details see Yee Whye Teh's implementation.
Upvotes: 1