Jason Donnald
Jason Donnald

Reputation:

issue in finding correlation in R

I am trying to find correlation between two separate data sets in R. The structure of my first data set is (when used print(matr1) in R):

        year  month  income  
 [1,]  "2000" "01"  "30000"
 [2,]  "2000" "02"  "12364"
 [3,]  "2000" "03"  "37485"
 [4,]  "2000" "04"  "2000"
 [5,]  "2000" "05"  "7573"

The structure of my second data set is(when used print(matr2) in R):

     month_year     value     
 [1,] "Jan 2000" "84737476"
 [2,] "Feb 2000" "39450334"
 [3,] "Mar 2000" "48384943"
 [4,] "Apr 2000" "12345678"
 [5,] "May 2000" "49595340"

Now I want to find out the correlation between these two data sets but the issue that I am having is that the format of month and year in both data sets is different. Also when I used R command cor(matr1[,"income"],matr2[,"value"]) I got the error as

Error in cor(matr1[,"income"],matr2[,"value"]) : 
  'x' must be numeric

So, my question is:

  1. How to remove the error?
  2. How to find the correlation when format of month and year is different?

Any guidance will be helpful for me as I am new to this.

Upvotes: 3

Views: 98

Answers (1)

gung - Reinstate Monica
gung - Reinstate Monica

Reputation: 11893

Working with dates is kind of a pain, IMO. But if you already know that your rows correspond (that is, the income in row i of matr1 goes with / is for the same month and year as the value in the same row of matr2), you can get a correlation quite simply with:

cor(as.numeric(matr1[,"income"]), as.numeric(matr2[,"value"]))

Upvotes: 2

Related Questions