Reputation: 10010

Correlate one set of vectors to another in numpy?

Let's say I have a set of vectors (readings from sensor 1, readings from sensor 2, readings from sensor 3 -- indexed first by timestamp and then by sensor id) that I'd like to correlate to a separate set of vectors (temperature, humidity, etc -- also all indexed first by timestamp and secondly by type).

What is the cleanest way in numpy to do this? It seems like it should be a rather simple function...

In other words, I'd like to see:

> a.shape 
(365,20)

> b.shape
(365, 5)

> correlations = magic_correlation_function(a,b)

> correlations.shape
(20, 5)

Cheers, /YGA

P.S. I've been asked to add an example.

Here's what I would like to see:

$ In [27]:  x
$ Out[27]: 
array([[ 0,  0,  0],
       [-1,  0, -1],
       [-2,  0, -2],
       [-3,  0, -3],
       [-4,  0.1, -4]])

$ In [28]: y
$ Out[28]: 
array([[0, 0],
       [1, 0],
       [2, 0],
       [3, 0],
       [4, 0.1]])

$ In [28]: magical_correlation_function(x, y)
$ Out[28]: 
array([[-1.        ,  0.70710678,  1.        ]
       [-0.70710678,  1.        ,  0.70710678]])

Ps2: whoops, mis-transcribed my example. Sorry all. Fixed now.

Upvotes: 4

Answers (3)

Mr Fooz

Reputation: 111866

Will this do what you want?

correlations = dot(transpose(a), b)

Note: if you do this, you'll probably want to standardize or whiten a and b first, e.g. something equivalent to this:

a = sqrt((a - mean(a))/(var(a)))
b = sqrt((b - mean(b))/(var(b)))

Upvotes: 1

AFoglia

Reputation: 8128

The simplest thing that I could find was using the scipy.stats package

In [8]: x
Out[8]: 
array([[ 0. ,  0. ,  0. ],
       [-1. ,  0. , -1. ],
       [-2. ,  0. , -2. ],
       [-3. ,  0. , -3. ],
       [-4. ,  0.1, -4. ]])
In [9]: y
Out[9]: 
array([[0. , 0. ],
       [1. , 0. ],
       [2. , 0. ],
       [3. , 0. ],
       [4. , 0.1]])

In [10]: import scipy.stats

In [27]: (scipy.stats.cov(y,x)
          /(numpy.sqrt(scipy.stats.var(y,axis=0)[:,numpy.newaxis]))
          /(numpy.sqrt(scipy.stats.var(x,axis=0))))
Out[27]: 
array([[-1.        ,  0.70710678, -1.        ],
       [-0.70710678,  1.        , -0.70710678]])

These aren't the numbers you got, but you've mixed up your rows. (Element [0,0] should be 1.)

A more complicated, but purely numpy solution is

In [40]: numpy.corrcoef(x.T,y.T)[numpy.arange(x.shape[1])[numpy.newaxis,:]
                                 ,numpy.arange(y.shape[1])[:,numpy.newaxis]]
Out[40]: 
array([[-1.        ,  0.70710678, -1.        ],
       [-0.70710678,  1.        , -0.70710678]])

This will be slower because it computes the correlation of each element in x with each other element in x, which you don't want. Also, the advanced indexing techniques used to get the subset of the array you desire can make your head hurt.

If you're going to use numpy intensely, get familiar with the rules on broadcasting and indexing. They will help you push as much down to the C-level as possible.

Upvotes: 2

Tim Lin

Reputation: 3374

As David said, you should define the correlation you're using. I don't know of any definitions of correlation that gives sensible numbers when correlating empty and non-empty signals.

Upvotes: -1

Correlate one set of vectors to another in numpy?

Answers (3)

Related Questions