Liz
Liz

Reputation: 45

Storing dataframe automatically converts character into numeric. How to stop this?

My data looks a bit like:

dummy.from <- data.frame(SetID = rep(c(104:109), times=4), Name = rep(c("A1", "A2", "A3", "A4"), each=6), Value=sample(c(1:100,0.5), 24) )

So:

    SetID Name Value
1    104   A1    82
2    105   A1    79
3    106   A1    54
4    107   A1    87
5    108   A1    62
6    109   A1    28
7    104   A2    37
8    105   A2    72
9    106   A2   100
10   107   A2    64
11   108   A2    14
...

Basically, what I want to do is to transfer part of the data to another data frame, based on another value (not shown) calculated separately for each SetID.

For that I use a for loop like:

dummy.to <- data.frame(SetID=numeric(0), Name=character(0), value=numeric(0), stringsAsFactors=FALSE)

for(i in 104:109){
  dummy.to[(nrow(dummy.to)+1):(nrow(dummy.to)+4),] <- dummy.from[dummy.from$SetID==i,]
}

The problem that I encounter is that just looking at the latter part of the code (dummy.from[dummy.from$SetID==i,]) is just the way I want it to be stored, when I then look at dummy.to, for some reason the Name column has been converted to numbers like this:

> dummy.to
   SetID Name value
1    104    1    82
7    104    2    37
13   104    3    52
19   104    4    73
2    105    1    79
8    105    2    72
14   105    3    91
....

Although strangely, when looking at the structure (str(dummy)), the Name column is still of type character. I'm really confused about this, as I'd like my names to show up as they were in the original data.frame. I know I could just create a loop to change the names back, but I'm wondering if there's something that I'm overlooking in the code which could be causing this behaviour.

Any advice would be much appreciated!

Upvotes: 0

Views: 217

Answers (3)

Pierre L
Pierre L

Reputation: 28441

Your for loop is sorting the data frame by the "SetID" column. There is a function for that called order

dummy.from[order(dummy.from$SetID),]

Or using the devel version of data.table you can order you data by reference. Link here: Installation: data.table

library(data.table) ## v 1.9.5+
setorder(dummy.from, SetID)

Upvotes: 1

nsm
nsm

Reputation: 319

I'm not sure that I understood what you need to do but it seems, in first place, that you don't need any for loop, Insted, to obtain the result you need:

dummy.to <- dummy.from[dummy.from$SetID==104:109,]

The problem you mentioned about thwe types is because the Name column in dummy.from is not character but numeric, because it is a factor.

Upvotes: 0

Jason
Jason

Reputation: 1569

data.frame auto sets any strings to factors. You want to change that.

dummy.from <- data.frame(SetID = rep(c(104:109), times=4), Name = rep(c("A1", "A2", "A3", "A4"), each=6), Value=sample(c(1:100,0.5), 24) )
str(dummy.from)
'data.frame':   24 obs. of  3 variables:
 $ SetID: int  104 105 106 107 108 109 104 105 106 107 ...
 $ Name : Factor w/ 4 levels "A1","A2","A3",..: 1 1 1 1 1 1 2 2 2 2 ...
 $ Value: num  37 9 69 38 93 71 91 34 86 51 ...

Here's what you want

dummy.from <- data.frame(SetID = rep(c(104:109), times=4), Name = rep(c("A1", "A2", "A3", "A4"), each=6), Value=sample(c(1:100,0.5), 24), stringsAsFactors = F) #your desired output just requires stringsAsFactors = F
> str(dummy.from)
'data.frame':   24 obs. of  3 variables:
 $ SetID: int  104 105 106 107 108 109 104 105 106 107 ...
 $ Name : chr  "A1" "A1" "A1" "A1" ...
 $ Value: num  80 46 61 52 38 9 7 59 15 56 ...

Upvotes: 1

Related Questions