user31888
user31888

Reputation: 421

data.matrix() modifies first column of the data frame in R

I have a data frame like so:

>df
         classA  classB  classC  classD
item1         0       0      34       6
item2         2      12     267      12
item3        45      26       3    5876
item4        23     110     674      17
item5         1      14      98      17
>class(df)
[1] "data.frame"
>typeof(df)
[1] "list"
>is.factor(df)
[1] FALSE

When I convert it to a numeric matrix (to do some operations on it), values of the first column (only) are changed.

>data.matrix(df)
          classA  classB  classC  classD
 item1         1       0      34       6
 item2         3      12     267      12
 item3        59      26       3    5876
 item4        34     110     674      17
 item5         2      14      98      17

I don't get it. Where do these numbers come from? How can I convert the data frame to a numeric matrix properly?

Upvotes: 1

Views: 682

Answers (2)

Cettt
Cettt

Reputation: 11981

I would guess that the first column of df is a factor (you can check by typing is.factor(df[,1])). The function data.matrix returns the internal values of factors. That is why you get different numbers.

One way to circumvent this is to transform the first column into a numeric column first, or use as.matrix instead.

Upvotes: 1

Saurabh Chauhan
Saurabh Chauhan

Reputation: 3221

You should use as.matrix:

> df
         ClassA ClassB ClassC ClassD
    1      0      0     34      6
    2      2     12    267     12
    3     45     26      3   5876
    4     23    110    674     17
    5      1     98     98     17
 > as.matrix(df)
       ClassA ClassB ClassC ClassD
[1,]      0      0     34      6
[2,]      2     12    267     12
[3,]     45     26      3   5876
[4,]     23    110    674     17
[5,]      1     98     98     17
> class(as.matrix(df))
[1] "matrix"

Upvotes: 2

Related Questions