Polhek
Polhek

Reputation: 77

Matlab's way of getting p-values for correlation

I have a vector A of size N and I want to calculate a correlation coefficient and p-value for the correlation of A with some other vector B.

I used corrcoef in Matlab, something like this:

[R, P] = corrcoef(A, B) 

And from what I understand, doing a t-test for this correlation R(1,2) to get a p-value equal to P(1,2) would mean calculating a test statistic t = sqrt(N-2)*R./sqrt(1-R.^2) and getting the p-value by

P = 1 - tcdf(t, N-2). 

However, if I proceed in this way, the p-value that I get is not the same as the p-value Matlab calculated. Could someone explain why, or what am I missing in the calculation? Thanks!

EDIT: Even if I do a two-sided test (P = 2*(1-tcdf(abs(t), N-2))), there's still a lot of differences in mine and Matlab's result.

Upvotes: 3

Views: 4394

Answers (3)

Greg Pelletier
Greg Pelletier

Reputation: 11

The answer by wsdzbm works. Here is the matlab code for an example of how to calculate the p-values the same way that matlab does in corrcoef if you only know the correlation coefficient matrix R and the number of samples N (compare p_check below for the manual calculation of the p-value compared with p from corrcoef):

load hospital
X = [hospital.Weight hospital.BloodPressure];
[R, p] = corrcoef(X)
N = size(X,1);
t = sqrt(N-2).*R./sqrt(1-R.^2);
s = tcdf(t,N-2);
p_check = 2 * min(s,1-s)
% R =
% 1.0000e+00   1.5579e-01   2.2269e-01
% 1.5579e-01   1.0000e+00   5.1184e-01
% 2.2269e-01   5.1184e-01   1.0000e+00
% p =
% 1.0000e+00   1.2168e-01   2.5953e-02
% 1.2168e-01   1.0000e+00   5.2460e-08
% 2.5953e-02   5.2460e-08   1.0000e+00
% p_check =
% 0            1.2168e-01   2.5953e-02
% 1.2168e-01            0   5.2460e-08
% 2.5953e-02   5.2460e-08            0

Upvotes: 0

wsdzbm
wsdzbm

Reputation: 3670

I check the relevant source codes of matlab and octave for p-value. The source code of octave is more clear.

Changing

P = 2*(1-tcdf(abs(t), N-2))

to

s = tcdf(t,N-2);
P = 2 * min(s,1-s);

does the trick. Then you get same p results as corrcoef.

Upvotes: 2

zglin
zglin

Reputation: 2919

Think you may have the formula computed incorrectly for your t-stat. Looking at a basic stats page, we see that the formula for the t-stat is shown as below.

T-stat

It looks like you're doing a element-wise operation when one is not necessary.

Doing a test in matlab to prove this.

>> a=rand(14,1)

a =

    0.6110
    0.7788
    0.4235
    0.0908
    0.2665
    0.1537
    0.2810
    0.4401
    0.5271
    0.4574
    0.8754
    0.5181
    0.9436
    0.6377
>> b=rand(14,1)

b =

    0.0358
    0.1759
    0.7218
    0.4735
    0.1527
    0.3411
    0.6074
    0.1917
    0.7384
    0.2428
    0.9174
    0.2691
    0.7655
    0.1887

I first create two random vectors for a and b.

>> [R,p]=corrcoef(a,b)

R =

    1.0000    0.2428
    0.2428    1.0000


p =

    1.0000    0.4030
    0.4030    1.0000

R(1,2) is our rho in this case and my formula is computed exactly as above.

t=R(1,2)*sqrt((length(a)-2)/(1-R(1,2)^2))
t =

    0.8670


>> p=2*(1-tcdf(t,length(a)-2))

p =

    0.4030

You can see that the correlation coefficient does a 2 sided test.

Upvotes: 2

Related Questions