Andrew
Andrew

Reputation: 688

When is a data.frame in R numeric?

I stumble on the following problem. I have a data.frame

A <- data.frame(let = c("A", "B", "C"), x = 1:3, y = 4:6)

The classes of its columns are

sapply(A, class)
      let         x         y 
 "factor" "integer" "integer" 
s.numeric(A$x)
[1] TRUE
is.numeric(A)
[1] FALSE

I do not understand why although A$x and B$x are numeric, the data.frame composed only by these two columns is not numeric

is.numeric(A[, c("x", "y")])
[1] FALSE

Removing the factor column does not help...

B <- A
B$let <- NULL
is.numeric(B)
[1] FALSE
is.numeric(B$x)
[1] TRUE
is.numeric(B$y)
[1] TRUE

So, I tried creating a new dataset built only with the numeric columns in A. Is it numeric? No...

C <- data.frame(B$x, B$y)
is.numeric(C)
[1] FALSE
C <- data.frame(as.numeric(B$x), as.numeric(B$y))
is.numeric(C)
[1] FALSE

There must be something I'm missing here. Any help?

Upvotes: 1

Views: 478

Answers (2)

Stibu
Stibu

Reputation: 15897

A data frame is always a data frame, independent of the classes of its columns. So what you get is the expected behaviour

If you want to check whether all columns in a data frame are numeric, you can use the following code

all(sapply(A, is.numeric))
## [1] FALSE
all(sapply(A[, c("x", "y")], is.numeric))
## [1] TRUE

A table with only numeric data can also be understood as a matrix. You can convert the numeric columns of your data frame to a matrix as follows:

M <- as.matrix(A[, c("x", "y")])
M
##      x y
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6

The matrix M is now really numeric:

is.numeric(M)
## [1] TRUE

Upvotes: 3

akrun
akrun

Reputation: 886968

We need to apply the function on the vector and not on the data.frame

sapply(A[c("x", "y")], is.numeric)

instead of

is.numerc(A)

as according to ?is.numeric

Methods for is.numeric should only return true if the base type of the class is double or integer and values can reasonably be regarded as numeric (e.g., arithmetic on them makes sense, and comparison should be done via the base type).

The class of 'A' is data.frame and is not numeric

class(A)
#[1] "data.frame"

sapply(A, class)

is.numeric returns TRUE only if the class of the object is numeric or integer.


Thus, a data.frame can never be numeric unless we apply the is.numeric on the vector or the extracted column. That is the reason, we do it on a loop with lapply/sapply where we get the column as a vector and its class would be the class of that column

Upvotes: 3

Related Questions