Reputation: 77
I have a vector A of size N and I want to calculate a correlation coefficient and p-value for the correlation of A with some other vector B.
I used corrcoef in Matlab, something like this:
[R, P] = corrcoef(A, B)
And from what I understand, doing a t-test for this correlation R(1,2)
to get a p-value equal to P(1,2)
would mean calculating a test statistic
t = sqrt(N-2)*R./sqrt(1-R.^2)
and getting the p-value by
P = 1 - tcdf(t, N-2).
However, if I proceed in this way, the p-value that I get is not the same as the p-value Matlab calculated. Could someone explain why, or what am I missing in the calculation? Thanks!
EDIT: Even if I do a two-sided test (P = 2*(1-tcdf(abs(t), N-2))
), there's still a lot of differences in mine and Matlab's result.
Upvotes: 3
Views: 4394
Reputation: 11
The answer by wsdzbm works. Here is the matlab code for an example of how to calculate the p-values the same way that matlab does in corrcoef if you only know the correlation coefficient matrix R and the number of samples N (compare p_check below for the manual calculation of the p-value compared with p from corrcoef):
load hospital
X = [hospital.Weight hospital.BloodPressure];
[R, p] = corrcoef(X)
N = size(X,1);
t = sqrt(N-2).*R./sqrt(1-R.^2);
s = tcdf(t,N-2);
p_check = 2 * min(s,1-s)
% R =
% 1.0000e+00 1.5579e-01 2.2269e-01
% 1.5579e-01 1.0000e+00 5.1184e-01
% 2.2269e-01 5.1184e-01 1.0000e+00
% p =
% 1.0000e+00 1.2168e-01 2.5953e-02
% 1.2168e-01 1.0000e+00 5.2460e-08
% 2.5953e-02 5.2460e-08 1.0000e+00
% p_check =
% 0 1.2168e-01 2.5953e-02
% 1.2168e-01 0 5.2460e-08
% 2.5953e-02 5.2460e-08 0
Upvotes: 0
Reputation: 3670
I check the relevant source codes of matlab and octave for p-value. The source code of octave is more clear.
Changing
P = 2*(1-tcdf(abs(t), N-2))
to
s = tcdf(t,N-2);
P = 2 * min(s,1-s);
does the trick. Then you get same p results as corrcoef
.
Upvotes: 2
Reputation: 2919
Think you may have the formula computed incorrectly for your t-stat. Looking at a basic stats page, we see that the formula for the t-stat is shown as below.
It looks like you're doing a element-wise operation when one is not necessary.
Doing a test in matlab to prove this.
>> a=rand(14,1)
a =
0.6110
0.7788
0.4235
0.0908
0.2665
0.1537
0.2810
0.4401
0.5271
0.4574
0.8754
0.5181
0.9436
0.6377
>> b=rand(14,1)
b =
0.0358
0.1759
0.7218
0.4735
0.1527
0.3411
0.6074
0.1917
0.7384
0.2428
0.9174
0.2691
0.7655
0.1887
I first create two random vectors for a and b.
>> [R,p]=corrcoef(a,b)
R =
1.0000 0.2428
0.2428 1.0000
p =
1.0000 0.4030
0.4030 1.0000
R(1,2) is our rho in this case and my formula is computed exactly as above.
t=R(1,2)*sqrt((length(a)-2)/(1-R(1,2)^2))
t =
0.8670
>> p=2*(1-tcdf(t,length(a)-2))
p =
0.4030
You can see that the correlation coefficient does a 2 sided test.
Upvotes: 2