user9165024
user9165024

Reputation: 69

For Loop for Correlations

I am wanting to get correlation values between two variables for each county.

I have subset my data as shown below and get the appropriate value for the individual Adams county, but am now wanting to do the other counties:

    CorrData<-read.csv("H://Correlation 
     Datasets/CorrelationData_Master_Regression.csv")
     CorrData2<-subset(CorrData, CountyName=="Adams")
     dzCases<-(cor.test(CorrData2$NumVisit, CorrData2$dzdx, 
      method="kendall"))
dzCases

I am wanting to do a For Loop or something similar that will make the process more efficient, and so that I don't have write 20 different variable correlations for each of the 93 counties.

When I run the following in R, it doesn't give an error, but it doesn't give me the response I was hoping for either. Rather than the Spearman's Correlation for each county, it seems to be ignoring the loop portion and just giving me the correlation between the two variables for ALL counties.

    CorrData<-read.csv("H:\\CorrelationData_Master_Regression.csv")
     for (i in CorrData$CountyName)
     {
     dzCasesYears<-cor.test(CorrData$NumVisit, CorrData$dzdx, 
     method="spearman")
     }

A very small sample of my data looks similar to this:

CountyName  Year    NumVisits        dzdx
Adams       2010    4.545454545      1.19           
Adams       2011    20.83333333      0.20           
Elmore      2010    26.92307692      0.24       
Elmore      2011    0                0.61           
Brown       2010    0               -1.16           
Brown       2011    17.14285714     -1.28           
Clark       2010    25              -1.02           
Clark       2011    0                1.13           
Cass        2010    17.85714286      0.50       
Cass        2011    27.55102041      0.11

I have tried to find a similar example online, but am not having luck!

Thank you in advance for all your help!

Upvotes: 1

Views: 8037

Answers (2)

Al3xEP
Al3xEP

Reputation: 328

You are looping but not using your iterator 'i' in your code. If this makes sense with respect with what you want to do (and judging from your condition). Based on comments, you might want to make sure you are using numerics. Also, i noticed that you are not iterating into your output cor.test vector. I'm not sure a loop is the most efficient way to do it, but it will be just fine and since your started with a loop, You should have something of the kind:

 dzCasesYears = list() #Prep a list to store your corr.test results
 counter = 0 # To store your corr.test into list through iterating

 for (i in unique(CorrData$CountyName))
 {
 counter = counter + 1
 # Creating new variables makes the code clearer
 x = as.numeric(CorrData[CorrData$CountyName == i,]$NumVisit)
 y = as.numeric(CorrData[CorrData$CountyName == i,]$dzdx)

 dzCasesYears[[counter]] <-cor.test(x,y,method="spearman")

 }

And it's always good to put a unique there when you are iterating.

Upvotes: 3

HarlandMason
HarlandMason

Reputation: 789

data.table makes operations like this very simple.

library('data.table')
CorrData <- as.data.table(read.csv("H:\\CorrelationData_Master_Regression.csv"))
CorrData[, cor(dzdx, NumVisits), CountyName]

With the sample data, it's all negative ones because there's two points per county and so the correlation is perfect. The full dataset should be more interesting!

   CountyName V1
1:      Adams -1
2:     Elmore -1
3:      Brown -1
4:      Clark -1
5:       Cass -1

Edit to include p values from cor.test as OP asked in the comment This is also quite simple!

CorrData[, .(cor=cor(dzdx, NumVisits),
             p=cor.test(dzdx, NumVisits)$p.value),
             CountyName]

...But it won't work with your sample data as two points per county is not enough for cor.test to get a p value. Perhaps you could take @smci's advice and dput a larger subset of the data to make your question truly reproducible

Upvotes: 0

Related Questions