Reputation:
I am trying to find correlation
between two separate data sets in R
. The structure of my first data set is (when used print(matr1)
in R
):
year month income
[1,] "2000" "01" "30000"
[2,] "2000" "02" "12364"
[3,] "2000" "03" "37485"
[4,] "2000" "04" "2000"
[5,] "2000" "05" "7573"
The structure of my second data set is(when used print(matr2)
in R
):
month_year value
[1,] "Jan 2000" "84737476"
[2,] "Feb 2000" "39450334"
[3,] "Mar 2000" "48384943"
[4,] "Apr 2000" "12345678"
[5,] "May 2000" "49595340"
Now I want to find out the correlation between these two data sets but the issue that I am having is that the format of month and year in both data sets is different. Also when I used R command cor(matr1[,"income"],matr2[,"value"])
I got the error as
Error in cor(matr1[,"income"],matr2[,"value"]) :
'x' must be numeric
So, my question is:
Any guidance will be helpful for me as I am new to this.
Upvotes: 3
Views: 98
Reputation: 11893
Working with dates is kind of a pain, IMO. But if you already know that your rows correspond (that is, the income
in row i of matr1
goes with / is for the same month and year as the value
in the same row of matr2
), you can get a correlation quite simply with:
cor(as.numeric(matr1[,"income"]), as.numeric(matr2[,"value"]))
Upvotes: 2