Compute KL divergence between rows of a matrix and a vector

Question

I have a matrix (numpy 2d array) in which each row is a valid probability distribution. I have another vector (numpy 1d array), again a prob dist. I need to compute KL divergence between each row of the matrix and the vector. Is it possible to do this without using for loops?

This question asks the same thing, but none of the answers solve my problem. One of them suggests to use for loop which I want to avoid since I have large data. Another answer provides a solution in tensorflow, but I want for numpy arrays.

scipy.stats.entropy computes KL divergence between 2 vectors, but I couldn't get how to use it when one of them is a matrix.

Warren Weckesser · Accepted Answer

The function scipy.stats.entropy can, in fact, do the vectorized calculation, but you have to reshape the arguments appropriately for it to work. When the inputs are two-dimensional arrays, entropy expects the columns to hold the probability vectors. In the case where p is two-dimensional and q is one-dimensional, a trivial dimension must be added to q to make the arguments compatible for broadcasting.

Here's an example. First, the imports:

In [10]: import numpy as np                                                     

In [11]: from scipy.stats import entropy

Create a two-dimensional p whose rows are the probability vectors, and a one-dimensional probability vector q:

In [12]: np.random.seed(8675309)                                                

In [13]: p = np.random.rand(3, 5)                                               

In [14]: p /= p.sum(axis=1, keepdims=True)                                      

In [15]: q = np.random.rand(5)                                                  

In [16]: q /= q.sum()                                                           

In [17]: p                                                                      
Out[17]: 
array([[0.32085531, 0.29660176, 0.14113073, 0.07988999, 0.1615222 ],
       [0.05870513, 0.15367858, 0.29585406, 0.01298657, 0.47877566],
       [0.1914319 , 0.29324935, 0.1093297 , 0.17710131, 0.22888774]])

In [18]: q                                                                      
Out[18]: array([0.06804561, 0.35392387, 0.29008139, 0.04580467, 0.24214446])

For comparison with the vectorized result, here's the result computed using a Python loop.

In [19]: [entropy(t, q) for t in p]                                             
Out[19]: [0.32253909299531597, 0.17897138916539493, 0.2627905326857023]

To make entropy do the vectorized calculation, the columns of the first argument must be the probability vectors, so we'll transpose p. Then, to make q compatible with p.T, we'll reshape it into a two-dimensional array with shape (5, 1) (i.e. it contains a single column):

In [20]: entropy(p.T, q.reshape(-1, 1))                                         
Out[20]: array([0.32253909, 0.17897139, 0.26279053])

Note: It is tempting to use q.T as the second argument, but that won't work. In NumPy, the transpose operation only swaps the lengths of existing dimensions--it never creates new dimensions. So the transpose of a one-dimensional array is itself. That is, q.T is the same shape as q.

Older version of this answer follows...

You can use scipy.special.kl_div or scipy.special.rel_entr to do this. Here's an example.

In [17]: import numpy as np 
    ...: from scipy.stats import entropy 
    ...: from scipy.special import kl_div, rel_entr

Make p and q for the example. p has shape (3, 5); the rows are the probability distributions. q is a 1-d array with length 5.

In [18]: np.random.seed(8675309) 
    ...: p = np.random.rand(3, 5) 
    ...: p /= p.sum(axis=1, keepdims=True) 
    ...: q = np.random.rand(5) 
    ...: q /= q.sum()

This is the calculation that you want, using a Python loop and scipy.stats.entropy. I include this here so the result can be compared to the vectorized calculation below.

In [19]: [entropy(t, q) for t in p]                                                                                                          
Out[19]: [0.32253909299531597, 0.17897138916539493, 0.2627905326857023]

We have constructed p and q so that the probability vectors each sum to 1. In this case, the above result can also be computed in a vectorized calculation with scipy.special.rel_entr or scipy.special.kl_div. (I recommend rel_entr. kl_div adds and subtracts additional terms that will ultimately cancel out in the sum, so it does a bit more work than necessary.) These functions compute only the point-wise part of the calculations; you have to sum the result to get the actual entropy or divergence.

In [20]: rel_entr(p, q).sum(axis=1)                                                                              
Out[20]: array([0.32253909, 0.17897139, 0.26279053])

In [21]: kl_div(p, q).sum(axis=1)                                                                                
Out[21]: array([0.32253909, 0.17897139, 0.26279053])

Compute KL divergence between rows of a matrix and a vector

Answers (1)

Related Questions