Geraldine
Geraldine

Reputation: 821

Correlation in R; numeric

I've dealt with R's correlation algorithm before, but I am unsure what is going on with my current code.

My input data are two .csv files. The first only has one column, and I coerced it as a data.frame. It looks like this (my data are quite long time series, so I'm only showing the first 10 data points)):

                  trends
         V1    0.2701541
         V2      2.00532
         V3      1.79548
         V4    0.2549123
         V5    0.2124736
         V6    -1.132594
         V7    -0.711875
         V8    -1.577067
         V9   -0.5320426
         V10    1.325005

My other files has several columns, and looks as follows:

       X13_EVI     X14_EVI     X15_EVI     X18_EVI
1    1.0492437  0.54155557 -0.58480284 -3.47111922
2    1.7274555  1.46141010  0.79416226  1.04050086  
3    1.7274555  1.46141010  0.48772557  1.17721662  
4   -0.1941446 -0.14833532 -0.12514781  0.22020630  
5   -0.1941446 -0.14833532 -0.12514781  0.22020630  
6   -0.5332505 -0.60826258 -0.73802119 -0.73680402 
7   -0.4202152 -0.49328077 -0.12514781 -0.32665674 
8   -0.9853917 -1.29815348 -1.04445787 -0.73680402 
9   -0.3071799 -0.03335350  0.18128888 -0.46337250  
10   0.5971025  1.00148284  1.10059895  0.63035358

When I try to do

corr=cor(trends, all.obs)

I get the error message

Error in cor(trends, all.obs) : 'x' must be numeric

I can't remember coming across this problem before and am unable to figure out what causes it. In the past I've always been able to calculate the correlation between each observed time series (the columns in all.obs) and the trend (in this case 1 trend). I've checked

> is.numeric(trends)
[1] FALSE
> is.numeric(all.obs)
[1] FALSE
> is.data.frame(all.obs)
[1] TRUE
> is.data.frame(trends)
[1] TRUE

I also did

> typeof(all.obs)
[1] "list"
> typeof(trends)
[1] "list"

because I got

> trends=as.numeric(trends)
Error: (list) object cannot be coerced to type 'double'

It's been a while since I worked with this though, so maybe I'm missing something very obvious?

Upvotes: 0

Views: 2589

Answers (1)

Athos
Athos

Reputation: 660

Try to see if all the columns of trends and all.obs are stored as numeric.

To do it, run sapply(trends, is.numeric) and sapply(all.obs, is.numeric). If you see any FALSE in the output you should fix it by coercing to numeric with the help of the as.numeric() function.

OR, a better way to avoid this kind of problem, is specifying the type of the columns when reading the csv files. You do this by using the colClasses parameter from read.csv function. Example:

trends <- read.csv("PATH_TO_DATA_FOLDER/trends.csv", colClasses = "numeric")
all.obs <- read.csv("PATH_TO_DATA_FOLDER/all_obs.csv", colClasses = rep("numeric", 4))

See if it is sufficient.

Upvotes: 3

Related Questions