S4M
S4M

Reputation: 4661

numerical values of the column of a matrix getting modified when converting into data.frame

Running on R 2.13, I want to have a data.frame of several column, the first one being of numeric type, the others of character type. When I am creating my object, the values of the first column are getting transformed in a way that I don't expect or understand. Please see the code below.

tmp <- cbind(1:10,rep("aa",10))

tmp

  [,1] [,2]
[1,] "1"  "aa"
[2,] "2"  "aa"
[3,] "3"  "aa"
[4,] "4"  "aa"
[5,] "5"  "aa"
[6,] "6"  "aa"
[7,] "7"  "aa"
[8,] "8"  "aa"
[9,] "9"  "aa"
[10,] "10" "aa"

tmp <- data.frame(tmp)

tmp

   X1 X2
1   1 aa
2   2 aa
3   3 aa
4   4 aa
5   5 aa
6   6 aa
7   7 aa
8   8 aa
9   9 aa
10 10 aa

tmp[,1] <- as.numeric(tmp[,1])

tmp

   X1 X2
1   1 aa
2   3 aa
3   4 aa
4   5 aa
5   6 aa
6   7 aa
7   8 aa
8   9 aa
9  10 aa
10  2 aa

For some reason, the values of the first column are getting changed. I must be doing something obviously wrong here, can someone point me a workaround?

Upvotes: 3

Views: 530

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226087

@aix's answer is a correct diagnosis. However, probably what you want to do is to create a data frame directly:

data.frame(1:10,rep("aa",10))

Rather than cbinding first (which makes a matrix) and then converting to a data frame.

You might want to give your variables sensible names rather than the weird ones they will end up with via the data.frame command above (X1.10 and rep..aa...10.):

data.frame(var1=1:10,var2=rep("aa",10))

Since data.frame replicates its arguments, you can shorten this even a bit more:

data.frame(var1=1:10,var2="aa")

And if you really want a character vector rather than a factor for the second column, you can use stringsAsFactors=FALSE or wrap var2 in I() (i.e. var2=I("aa"))

Upvotes: 5

NPE
NPE

Reputation: 500237

> tmp <- data.frame(cbind(1:10,rep("aa",10)))
> str(tmp)
'data.frame':   10 obs. of  2 variables:
 $ X1: Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
 $ X2: Factor w/ 1 level "aa": 1 1 1 1 1 1 1 1 1 1

As you can see above, tmp$X1 got converted into a factor, which is what's causing the behaviour you're seeing.

Try:

tmp[,1] <- as.numeric(as.character(tmp[,1]))

Upvotes: 6

Related Questions