Unexpected behavior of apply v. for loop in R

Question

I want to use apply instead of a for loop to speed up a function that creates a character string vector from paste-collapsing each row in a data frame, which contains strings and numbers with many decimals.
The speed up is notable, but apply forces the numbers to fill the left side with spaces so that all values have the same number of characters and rounds the numbers to integers, whereas the for loop does not.
I was able to work around this doing as.character to the numbers, but the data frame memory usage is much larger, and I still don't know why apply does this. Does anyone have an explanation or a better solution?

Using apply:

df <- data.frame(V1=rep(letters[1:20], 1000/20), V2=(1:1000)+0.00000001,
 + V3=rep(letters[1:20], 1000/20), stringsAsFactors=F)

system.time(varapl <- apply(df, 1, function(x){
                paste(x[1:3], collapse="_")
                }))
varapl[c(1,10,100,1000)]

Output:

  user  system elapsed 
  0.01    0.00    0.02 

[1] "a_   1_a" "j_  10_j" "t_ 100_t" "t_1000_t"
# Spaces to the right and rounded!

Using for:

varfor <- NULL
system.time(for(i in 1:1000){
  varfor <- c(varfor, paste(df[i,1:3], collapse="_"))
})
varfor[c(1,10,100,1000)]

Output:

   user  system elapsed 
   0.19    0.00    0.19 

[1] "a_1.00000001_a"    "j_10.00000001_j"   "t_100.00000001_t"  "t_1000.00000001_t"
# This is what I'm looking for!

The workaround was:

df2 <- data.frame(V1=rep(letters[1:20], 1000/20), 
+ V2=as.character((1:1000)+0.00000001),
+ V3=rep(letters[1:20], 1000/20), stringsAsFactors=F)

varapl[c(1,10,100,1000)]

[1] "a_1.00000001_a"   "j_10.00000001_j"  "t_100.00000001_t"  "t_1000.00000001_t"

However:

object.size(df)
26816 bytes
object.size(df2)
97208 bytes

My original data frames have millions of entries, so both speed and memory constraints are important.

Thank you in advance for your comments! Keo.

Keo · Accepted Answer

@alexis_laz answered the question (Thanks!) by linking to this. I'm posting it here since it it was mentioned in the comments section.

Unexpected behavior of apply v. for loop in R

Answers (2)

Related Questions