dplanet
dplanet

Reputation: 5403

Confidence intervals on a matrix of data in SAS

I have the following matrix of data, which I am reading into SAS:

1         5        12        19        13
6         3         1         3        14
2         7        12        19        21
22        24        21        29        18
17        15        22         9        18

It represents 5 different species of animal (the rows) in 5 different areas of an environment (the columns). I want to get a Shannon diversity index for the whole environment, so I sum the rows to get:

48        54        68        79        84

Then calculate the Shannon index from this, to get:

1.5873488

What I need to do, however, is calculate a confidence interval for this Shannon index. So I want to perform a nonparametric bootstrap on the initial matrix.

Can anyone advise how this is possible in SAS?

Upvotes: 1

Views: 386

Answers (1)

itzy
itzy

Reputation: 11755

There are several ways to do this in SAS. I would use proc surveyselect to generate the bootstrap samples, and then calculate the Shannon Index for each replicate. (I didn't know what the Shannon Index was, so my code is just based on what I read on Wikipedia.)

data animals;
    input v1-v5;
    cards;
1         5        12        19        13
6         3         1         3        14
2         7        12        19        21
22        24        21        29        18
17        15        22         9        18
run;

/* Generate 5000 bootstrap samples, with replacement */
proc surveyselect data=animals method=urs n=5 reps=5000 seed=10024 out=boots;
run;

/* For each replicate, calculate the sum of each variable */
proc means data=boots noprint nway;
    class replicate;
    var v:;
    output out=sums sum=;
run;

/* Calculate the proportions, and p*log(p), which will be used next */
data sums;
    set sums;
    ttl=sum(of v1-v5);
    array ps{*} p1-p5;
    array vs{*} v1-v5;
    array hs{*} h1-h5;
    do i=1 to dim(vs);
        ps{i}=vs{i}/ttl;
        hs{i}=ps{i}*log(ps{i});
    end;
    keep replicate h:;
run;

/* Calculate the Shannon Index, again for each replicate */
data shannon;
    set sums;
    shannon = -sum(of h:);
    keep replicate shannon;
run;

We now have a data set, shannon, which contains the Shannon Index calculated for each of 5000 bootstrap samples. You could use this to calculate p-values, but if you just want critical values, you can run proc means (or univariate if you want a 5% value, as I don't think it's possible to get 97.5 quantiles with proc means).

proc means data=shannon mean p1 p5 p95 p99;
    var shannon;
run;

Upvotes: 2

Related Questions