Madina Tultabayeva
Madina Tultabayeva

Reputation: 73

Correlation between every column in R

I have a data with 24 variables(columns) and 1000 rows. Columns represent AGE,SALARY,REGION,GENDER, etc.

I need to find correlation between each column, (AGE,SALARY), (AGE, REGION) (AGE,GENDER) etc.i.e. I need to gdet 23*24=552 correlations. Is there any way to make a cycle or something, and get all those correlations at once, rather than find it seperately 552 times?
Please help! I can't do it 552 times. There must be a way!

UPDATE: I think I got what I wanted by COR<-cor(mytest[sapply(mytest,is.numeric)]) and I got something like

    AGE      SALARY    REGION

AGE        1          NA       0.25
SALARY     NA          1
REGION     0.25        NA        1

etc. but now the problem is that it gives me NA's which I don't need. I tried this

> COR<-cor(mytest[sapply(mytest,is.numeric)],use="complete.obs")

but unfortunately it doesn't work, gives me error "no complete element pairs". How do I do that? Thanks in advance

Upvotes: 1

Views: 3962

Answers (2)

Nick DiQuattro
Nick DiQuattro

Reputation: 739

I think you want a correlation matrix, try this:

cor(yourdataframe)

EDIT:

I think I misunderstood, if you want to correlate Age with every other column try this:

apply(yourdataframe, 2, cor, x = AGE)

Upvotes: 4

switcute617
switcute617

Reputation: 27

you need to use select. try to refer to this link

http://www.sqlskills.com/blogs/joe/exploring-column-correlation-and-cardinality-estimates/

Upvotes: -2

Related Questions