Debutant
Debutant

Reputation: 357

correlation for data in matrix format in r

I have created a matrix in R and I want to investigate the correlation between two columns. My_matrix is:

         speed motor rpm acceleration age
cadillac     3        42           67  22
porche       5        40           68  21
ferrari      7        37           69  20
peugeot     10        32           70  19
kia         12        28           71  18

when I try the cor(speed~age, data=My_matrix) I get the following error:

Error in cor(speed ~ age, data = a) : unused argument (data = My_matrix)

any idea how I can address this? Thanks.

Upvotes: 1

Views: 353

Answers (3)

Shawn Janzen
Shawn Janzen

Reputation: 437

There are some great base R solutions on here already (hats off to @akrun & @Debutant, base R is great!). I would like to add alternate solutions for future viewers and code preference options.

If you don't like typing quote marks and the dataset is small enough, column numbers can be faster--although variable names in quotations is better for accuracy (especially if the columns are reordered).

@mikey in the comments offered a column number solution, here is an alternate version:

cor(My_matrix[,c(1,4)])

If your data is a dataframe instead of a matrix, you might enjoy a tidyverse approach, which also does not require quotation marks (although pesky variables with spaces in their names may require ` marks):

library(dplyr)
My_dataframe %>% select(speed, age) %>% cor()

@Debutant only asked for 2 variables for the correlation but if we wanted to go all out and get the full correlation matrix available, here are additional options:

# assuming all your columns are numeric as they are here
cor(My_matrix)
# if you have a dataframe with different data types, select only the numeric ones
library(dplyr)
My_dataframe %>% select_if(is.numeric) %>% cor()
# if you don't like the long decimals, toss in a round() for good measure
My_dataframe %>% select_if(is.numeric) %>% cor() %>% round(3)

Hope you find this useful. :)

Upvotes: 0

Debutant
Debutant

Reputation: 357

I also tried this and it worked: I created a "b" dataset

b=as.data.frame(My_matrix)

then I used the

cor(b$speed, b$age) and got the correlation.

Upvotes: 1

akrun
akrun

Reputation: 886938

We can subset the columns and apply the cor directly as the usage of cor is

cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))

and there is no formula method

cor(My_matrix[,c("speed", "age")])
#          speed        age
#speed  1.0000000 -0.9971765
#age   -0.9971765  1.0000000

Upvotes: 1

Related Questions