Tim
Tim

Reputation: 99468

Why is Matlab signrank function returns the same signed rank statistic values when flipping the signs of the data points?

Why is Matlab signrank function returns the same signed rank statistic values when flipping the signs of the data points?

I have a sequence of data points stored in vector x. I use signrank(x) to do sign rank test.

Matlab says

When you use the test for one sample, then W is the sum of the ranks of positive differences between the observations and the hypothesized median value M0 (which is 0 when you use signrank(x) and m when you use signrank(x,m)).

So I think the result signrank(x) and signrank(-x) should be different. But I have experienced some examples, and I get the same sign rank statistic value for x and -x. How is the signed rank statistic defined in Matlab signrank function?

Thanks!

Upvotes: 3

Views: 1204

Answers (2)

Stuart
Stuart

Reputation: 885

Thanks! Actually the statistic is the minimum between the sum of the ranks of positive differences and the sum of the ranks of negative differences. I don't understand why it takes the minimum. Do you?

Interesting question, and thanks for the link to the matlab code. Yes that had me scratching my head for a few minutes too, they certainly do it a curly manner, presumably for computational efficiency. Surprisingly however it does actually do the signed rank, exactly as posted previously.

Here's how it works (I've pasted the relevant few lines of code below for reference).

Let me denote P as the sum of all positive ranks (ranks corresponding to positive scores), N as the sum of all the negative ranks, and finally A as the absolute sum of all ranks. Clearly A = P + N (btw. Note that what I've denoted as "N" is the variable "w" in the actual code.)

By arithmetic series, A = n*(n+1)/2. So as you said, the line min(w,(n+1)*n/2-w) is actually returning either N or P (=A-N), whichever is minimum.

But now look at the last line of the code I pasted below. The numerator is therefore min(N,P) - A/2.

Now if N is the minimum this returns N-(P+N)/2, which equals -(P - N)/2.

However if P is the minimum this returns P-(P+N)/2, which equals -(N - P)/2.

So in either case it really is returning the (negative of) the absolute difference of the positive and negative rank sums, precisely as previously posted in the simplified form of,

| Sum{ sign(Xi) rank(|Xi|) } |

BTW. The reason why they use the negative of the absolute difference there is simply that it saves them from having to find the complementary cfd later.

Snippet from signrank code for reference.

w = sum(tierank(neg));
w = min(w, n*(n+1)/2-w);
...
z = (w-n*(n+1)/4) / sqrt((n*(n+1)*(2*n+1) - tieadj)/24);

Edit:

Why does it take absolute value? For z to have asymptotic normality, isn't it that there should be no absolute value taken?

My understanding of it is that's it's not actually normal, it's "folded normal". That is, folded into the positive half plane. That's why the p-value is calculated as,

p = 2*(1 - normcdf(z,0,1));

(Aside). I know that in the actual code they use the negative of "z" to avoid requiring the cdf-complement there, but it's the same thing.

The p value is multiplied by two to account for the folded distribution. Conveniently, this also works out exactly the same as calling it a "two tailed" p value.

Think for a moment about what would happen if we didn't use the absolute value here. Say we took P-N and N was greater than P. In this case the p value, 2*(1-normcdf(z,0,1)), would evaluate to greater than one, so that can't be a good idea. :)

Upvotes: 2

Stuart
Stuart

Reputation: 885

Why is Matlab signrank function returns the same signed rank statistic values when flipping the signs of the data points?

Because the single argument form of signrank, eg singrank(x), returns the likelihood (pval) that the null hypothesis, Prob(x>0)==0.5, is correct.

And by symmetry, the likelihood that Prob(x>0)==0.5 is correct is exactly the same as the likelihood that Prob(x<0)==0.5 is correct.

Update:

"Thanks! My question is: even when x is not symmetric around 0, signrank() still returns the same statistic value for both x and -x." - Tim

Yes I understand your point of confusion there, the symmetry is not entirely obvious. The result of signrank() is essentially an estimate of the likelihood that median(x)==0. So imagine that we made x asymmetrical, say by adding one to every element. Now the "mass" is moved so it's more in the positive half number-line, so signrank(x) will return a very small (close to zero) likelihood that median(x)==0. Hopefully you can see however, that it's equally true that now -x is shifted so that it's mass is more in the negative half number-line, and so it's also equally unlikely that median(-x)==0. Hope that helps.

BTW. The actual (intermediate) statistic used in finding this likelihood is:

| Sum{ sign(Xi) rank(|Xi|) } |

You can see that this is completely symmetrical for -X.

Upvotes: 1

Related Questions