user3086871
user3086871

Reputation: 721

How to determine time complexity of EM algorithm of probabilistic PCA?

I was studying probabilistic pca from bishop's book, there an EM algo is provdied to calculate principal subspace.

enter image description here

enter image description here

Here M is MxM matrix, W is DxM matrix and (xn − x) is vector Dx1 matrix. Later in the book there is statement regarding the time complexity: "Instead, the most computationally demanding steps are those involving sums over the data set that are O(NDM)."

I was wondering if anyone can help me understanding the time complexity of the algorithm. Thanks in advance.

Upvotes: 1

Views: 337

Answers (1)

lejlot
lejlot

Reputation: 66850

Let us go one by one

  1. E[zn] = M^-1 W' (xn - x)

    • M^-1 can be precomputed, thus you do not pay O(M^3) everytime you need this kind of value, but rather a single O(M^3) cost at the end
    • despite that it is multiplication of matrices of sizes MxM * MxD * Dx1 which is O(M^2D)
    • result is of size Mx1
  2. E[zn zn'] = sigma^2 M^-1 + E[zn]E[zn]'

    • sigma^2 M^-1 is just multiplication by constant thus linear in size of matrix, O(M^2)
    • second operation is outer product of Mx1 and 1xM vectors, thus result is MxM again, and takes O(M^2) too
    • result is M x M matrix
  3. Wnew = [SUM (xn-x) E[zn]'][SUM E[zn zn']]

    • First part is N times repeated (sum) operation of multiplying Dx1 matrix by 1xM, thus complexity is O(NDM); result is of size D x M
    • Second part is again sum of N elements, each being a matrix of M x M, thus in total O(NM^2)
    • Finally we compute product of D x M and M x M, which is O(DM^2), and again results in D x M matrix
  4. sigma^2new = 1/ND SUM[||xn-x||^2 - 2E[zn]'Wnew'(xn-x) + Tr(E[zn zn']Wnew'Wnew)]

    • Again we sum N times, this time 3 element sum - first part is just a norm, thus we compute it in O(D) (linear in size of vectors), second term is multiplication of 1 x M, M x D and D x 1 resulting in complexity of O(MD) (per each of iterations, thus in total O(NMD)), and last part is again about multiplying three matrices of sizes M x M, M x D, D x M thus leading to O(M^3D) (*N), but you just need the trace and you can precompute Wnew'Wnew, thus this part is just trace of MxM times MxM matrices leading to O(M^2) (*N)

In total you get O(M^3) + O(NMD) + O(M^2D) + O(M^2N), and I suppose there is an assumption that M<=D<=N thus O(NMD)

Upvotes: 2

Related Questions