Reputation: 1502
I want to calculate the correlation between my dependent variable y and all my x. I use the code below,
cor(loan_data_10v[sapply(loan_data_10v, is.numeric)],use="complete.obs")
the result is a correlation matrix. How can i just get one column with my variable y.
Upvotes: 16
Views: 29534
Reputation: 4708
Another option is the corrr package where you can specify the variable you want to focus
on easily which returns a data.frame
:
library(tidyverse)
library(corrr)
mtcars %>%
correlate() %>%
focus(mpg)
# Correlation computed with
# • Method: 'pearson'
# • Missing treated using: 'pairwise.complete.obs'
# # A tibble: 10 × 2
# term mpg
# <chr> <dbl>
# 1 cyl -0.852
# 2 disp -0.848
# 3 hp -0.776
# 4 drat 0.681
# 5 wt -0.868
# 6 qsec 0.419
# 7 vs 0.664
# 8 am 0.600
# 9 gear 0.480
# 10 carb -0.551
Its also useful if you want to remove other non-numeric variables first e.g.:
iris %>%
select_if(~!is.factor(.)) %>%
correlate() %>%
focus(Petal.Width)
# Correlation computed with
# • Method: 'pearson'
# • Missing treated using: 'pairwise.complete.obs'
# # A tibble: 3 × 2
# term Petal.Width
# <chr> <dbl>
# 1 Sepal.Length 0.818
# 2 Sepal.Width -0.366
# 3 Petal.Length 0.963
Upvotes: 2
Reputation: 887881
If we are looking for cor
between 'x' and 'y', both argument can be either a vector
or matrix
. using a reproducible example, say mtcars
and suppose 'y' is 'mpg' and 'x' the other variables ('mpg' is the first column, so we used mtcars[-1]
for 'x')
cor(mtcars[-1], mtcars$mpg)
# [,1]
#cyl -0.8521620
#disp -0.8475514
#hp -0.7761684
#drat 0.6811719
#wt -0.8676594
#qsec 0.4186840
#vs 0.6640389
#am 0.5998324
#gear 0.4802848
#carb -0.5509251
If we have numeric/non-numeric
columns, create an index of numeric
columns ('i1'), get the names
of 'x' and 'y' variables using this index and apply the cor
i1 <- sapply(loan_data_10v, is.numeric)
y1 <- "dep_column" #change it to actual column name
x1 <- setdiff(names(loan_data_10v)[i1], y1)
cor(loan_data_10v[x1], loan_data_10v[[y1]])
Upvotes: 24