Reputation: 73
I have a data with 24 variables(columns) and 1000 rows. Columns represent AGE,SALARY,REGION,GENDER, etc.
I need to find correlation between each column, (AGE,SALARY), (AGE, REGION) (AGE,GENDER) etc.i.e. I need to gdet 23*24=552 correlations. Is there any way to make a cycle or something, and get all those correlations at once, rather than find it seperately 552 times?
Please help! I can't do it 552 times. There must be a way!
UPDATE: I think I got what I wanted by COR<-cor(mytest[sapply(mytest,is.numeric)])
and I got something like
AGE SALARY REGION
AGE 1 NA 0.25
SALARY NA 1
REGION 0.25 NA 1
etc. but now the problem is that it gives me NA's which I don't need. I tried this
> COR<-cor(mytest[sapply(mytest,is.numeric)],use="complete.obs")
but unfortunately it doesn't work, gives me error "no complete element pairs". How do I do that? Thanks in advance
Upvotes: 1
Views: 3962
Reputation: 739
I think you want a correlation matrix, try this:
cor(yourdataframe)
EDIT:
I think I misunderstood, if you want to correlate Age with every other column try this:
apply(yourdataframe, 2, cor, x = AGE)
Upvotes: 4
Reputation: 27
you need to use select. try to refer to this link
http://www.sqlskills.com/blogs/joe/exploring-column-correlation-and-cardinality-estimates/
Upvotes: -2