mccurcio
mccurcio

Reputation: 1344

'x' must be a numeric vector: Error from data.frame of numbers

I am running a cor.test on two columns within a file/table.

tmp <- read.table(files_to_test[i], header=TRUE, sep="\t")
## Obtain Columns To Compare ##
colA <-tmp[compareA]
colB <-tmp[compareB]
# sctr = 'spearman cor.test result'
sctr <- cor.test(colA, colB, alternative="two.sided", method="spearman")

But I am getting this confounding error...

Error in cor.test.default(colA, colB, alternative = "two.sided", method = "spearman") : 
'x' must be a numeric vector

the values in the columns ARE numbers but

is.numeric(colA) = FALSE 
class (colA) = data.frame

What have I missed?

Upvotes: 2

Views: 64781

Answers (2)

John
John

Reputation: 23758

Put a comma before your selector. When you select in a data.frame object with a single indexing variable without a comma it extracts a column as a list element retaining type. Therefore, it's still a data.frame. But, data.frame objects allow you to select using matrix style notation and then you would get a simple vector. So just change

colA <-tmp[compareA]
colB <-tmp[compareB]

to

colA <-tmp[,compareA]
colB <-tmp[,compareB]

I think this is more keeping with the spirit of the data.frame type than double brace ([[) selectors, which will do something similar but in the spirit of the underlying list type. They also are unrelated to individual item and row selectors. So, in code that's doing multiple kinds of things with the data.frame the double brace selectors stand out as a bit of an odd duck.

Upvotes: 10

Ben Bolker
Ben Bolker

Reputation: 226172

Try tmp[[compareA]] and tmp[[compareB]] instead of single brackets. You wanted to extract numeric vectors, what you did instead was to extract single-column data frames. Compare the following:

> z <- data.frame(a=1:5,b=1:5)
> str(z["a"])
'data.frame':   5 obs. of  1 variable:
 $ a: int  1 2 3 4 5
> is.numeric(z["a"])
[1] FALSE
> str(z[["a"]])
 int [1:5] 1 2 3 4 5
> is.numeric(z[["a"]])
[1] TRUE

Try these out with cor.test:

Single brackets: error as above.

> cor.test(z["a"],z["b"])
Error in cor.test.default(z["a"], z["b"]) : 'x' must be a numeric vector

Double brackets: works.

> cor.test(z[["a"]],z[["b"]])

    Pearson's product-moment correlation

data:  z[["a"]] and z[["b"]] 
[snip snip snip]

As @Aaron points out below, cor will handle single-column data frames fine, by converting them to matrices -- but cor.test doesn't. (This could be brought up on [email protected] , or ?? submitted to the R bug tracker as a a wish list item ...)

See also: Numeric Column in data.frame returning "num" with str() but not is.numeric() , What's the biggest R-gotcha you've run across? (maybe others)

Upvotes: 4

Related Questions