Crystal
Crystal

Reputation: 137

What is the difference between MATLAB/Octave corr and Python numpy.correlate?

I am trying to port a MATLAB/Octave program to Python using NumPy 1.8.0 and Python 2.7.3. I've used this reference as help in converting MATLAB functions to NumPy methods with great success, until I get to the point where I want to compute the correlation between two matrices.

The first matrix is 40000x25 floats, the second matrix is 40000x1 ints. In Octave I use the statement corr(a,b) and get a 25x1 matrix of floats. Trying the corresponding method in NumPy (numpy.correlate(a,b)) produces an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_1a9aa5a_20130415-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 751, in correlate
  return multiarray.correlate2(a,v,mode)
ValueError: object too deep for desired array

I can get it to work if I change the code to calculate a correlation for each column of a, like so:

for i in range(25):
    c2[i] = numpy.correlate(a[:,i], b)

However, the values in the c2 array are different than the output from Octave. Octave returns a 25x1 matrix of floats all less than 1. The values I get from NumPy are floats between -270 and 900.

I have tried to understand what the two algorithms are doing under the hood but have failed miserably. Can someone point out my logic failure?

Upvotes: 7

Views: 5168

Answers (1)

voithos
voithos

Reputation: 70602

It appears that there exists a numpy.corrcoef which computes the correlation coefficients, as desired. However, its interface is different from the Octave/Matlab corr.

First of all, by default, the function treats rows as variables, with the columns being observations. To mimic the behavior of Octave/Matlab, you can pass a flag which reverses this.

Also, according to this answer, the numpy.cov function (which corrcoef uses internally, I assume) returns a 2x2 matrix, each of which contain a specific covariance:

cov(a,a)  cov(a,b)

cov(a,b)  cov(b,b)

As he points out, the [0][1] element is what you'd want for cov(a,b). Thus, perhaps something like this will work:

for i in range(25):
    c2[i] = numpy.corrcoef(a[:,i], b, rowvar=0)[0][1]

For reference, here are some excerpts of the two functions that you had tried. It seems to be that they perform completely different things.

Octave:

— Function File: corr (x, y)

Compute matrix of correlation coefficients.

If each row of x and y is an observation and each column is a variable, then the (i, j)-th entry of corr (x, y) is the correlation between the i-th variable in x and the j-th variable in y.

      corr (x,y) = cov (x,y) / (std (x) * std (y))

If called with one argument, compute corr (x, x), the correlation between the columns of x.

And Numpy:

numpy.correlate(a, v, mode='valid', old_behavior=False)[source]

Cross-correlation of two 1-dimensional sequences.

This function computes the correlation as generally defined in signal processing texts:

z[k] = sum_n a[n] * conj(v[n+k])

with a and v sequences being zero-padded where necessary and conj being the conjugate.

Upvotes: 8

Related Questions