seb-29
seb-29

Reputation: 41

Significance value for correlation computed with survey package in R

I would like to obtain the significance value for a correlation computed on complex survey data in R! As far as I know, correlations are calculated as follows with the survey package:

var <- svyvar(~var1+var2, design, na.rm=TRUE)
cov2cor(as.matrix(var))

This only provides the correlation coefficients, though. Hence my question: How can I get the significance value for such a correlation? I know that this might be done by passing the sig.stats = TRUE argument to the svycor function in the jtools package:

svycor(~var1+var2,
       design,
       na.rm = TRUE,
       sig.stats = TRUE,
       bootn = 1000)

This, however, relies on the wtd.cor function in the weights package - which I somehow have troubles installing, due to some issue with the gdata dependency. Does anyone have a workaround for this? Thanks a lot in advance!

Upvotes: 1

Views: 258

Answers (1)

Thomas Lumley
Thomas Lumley

Reputation: 2765

The easiest way to get this significance test is

summary(svyglm(var2~var1, design=design, na.rm=TRUE))

The null hypothesis that the correlation is zero is exactly the null hypothesis that there is no linear component to the relationship between the variables. This gives you very close to the test you could do if svycontrast worked on svyvar output

That is, assume

svycontrast.svyvar<-function(stats, ...){
    s<-as.vector(as.matrix(stats))
    nms<-as.vector(outer(rownames(stats),colnames(stats),paste,sep=":"))
    v<-vcov(stats)
    names(s)<-nms
    dimnames(v)<-list(nms,nms)
    attr(s,"var")<-v
    attr(s,"statistic")<-"variance"
    class(s)<-"svystat"
    svycontrast(s,...)
}

then with the built-in dclus1 data set as an example

> summary(svyglm(api00~ell,design=dclus1))

Call:
svyglm(formula = api00 ~ ell, design = dclus1)

Survey design:
svydesign(id = ~dnum, fpc = ~fpc, data = apiclus1)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  746.926     26.900   27.77 5.85e-13 ***
ell           -3.721      0.467   -7.97 2.33e-06 ***

and

> b<-svyvar(~api00+ell,dclus1)
> svycontrast(b, quote(`api00:ell`/sqrt(`api00:api00`*`ell:ell`)))
           nlcon     SE
contrast -0.5941 0.0778
> -0.5941/ 0.0778
[1] -7.636247

for basically the same z-statistic that you could then turn into a p-value.

This is better than using the weights package, because weights::wtd.cor doesn't account for clustering or stratification, only for weights.

Upvotes: 0

Related Questions