Reputation: 69
I am wanting to get correlation values between two variables for each county.
I have subset my data as shown below and get the appropriate value for the individual Adams county, but am now wanting to do the other counties:
CorrData<-read.csv("H://Correlation
Datasets/CorrelationData_Master_Regression.csv")
CorrData2<-subset(CorrData, CountyName=="Adams")
dzCases<-(cor.test(CorrData2$NumVisit, CorrData2$dzdx,
method="kendall"))
dzCases
I am wanting to do a For Loop or something similar that will make the process more efficient, and so that I don't have write 20 different variable correlations for each of the 93 counties.
When I run the following in R, it doesn't give an error, but it doesn't give me the response I was hoping for either. Rather than the Spearman's Correlation for each county, it seems to be ignoring the loop portion and just giving me the correlation between the two variables for ALL counties.
CorrData<-read.csv("H:\\CorrelationData_Master_Regression.csv")
for (i in CorrData$CountyName)
{
dzCasesYears<-cor.test(CorrData$NumVisit, CorrData$dzdx,
method="spearman")
}
A very small sample of my data looks similar to this:
CountyName Year NumVisits dzdx
Adams 2010 4.545454545 1.19
Adams 2011 20.83333333 0.20
Elmore 2010 26.92307692 0.24
Elmore 2011 0 0.61
Brown 2010 0 -1.16
Brown 2011 17.14285714 -1.28
Clark 2010 25 -1.02
Clark 2011 0 1.13
Cass 2010 17.85714286 0.50
Cass 2011 27.55102041 0.11
I have tried to find a similar example online, but am not having luck!
Thank you in advance for all your help!
Upvotes: 1
Views: 8037
Reputation: 328
You are looping but not using your iterator 'i' in your code. If this makes sense with respect with what you want to do (and judging from your condition). Based on comments, you might want to make sure you are using numerics. Also, i noticed that you are not iterating into your output cor.test vector. I'm not sure a loop is the most efficient way to do it, but it will be just fine and since your started with a loop, You should have something of the kind:
dzCasesYears = list() #Prep a list to store your corr.test results
counter = 0 # To store your corr.test into list through iterating
for (i in unique(CorrData$CountyName))
{
counter = counter + 1
# Creating new variables makes the code clearer
x = as.numeric(CorrData[CorrData$CountyName == i,]$NumVisit)
y = as.numeric(CorrData[CorrData$CountyName == i,]$dzdx)
dzCasesYears[[counter]] <-cor.test(x,y,method="spearman")
}
And it's always good to put a unique there when you are iterating.
Upvotes: 3
Reputation: 789
data.table
makes operations like this very simple.
library('data.table')
CorrData <- as.data.table(read.csv("H:\\CorrelationData_Master_Regression.csv"))
CorrData[, cor(dzdx, NumVisits), CountyName]
With the sample data, it's all negative ones because there's two points per county and so the correlation is perfect. The full dataset should be more interesting!
CountyName V1
1: Adams -1
2: Elmore -1
3: Brown -1
4: Clark -1
5: Cass -1
Edit to include p values from cor.test as OP asked in the comment This is also quite simple!
CorrData[, .(cor=cor(dzdx, NumVisits),
p=cor.test(dzdx, NumVisits)$p.value),
CountyName]
...But it won't work with your sample data as two points per county is not enough for cor.test to get a p value. Perhaps you could take @smci's advice and dput
a larger subset of the data to make your question truly reproducible
Upvotes: 0