tonytonov
tonytonov

Reputation: 25638

Avoid rbind()/cbind() conversion from numeric to factor

I'm trying to build a dataset before plotting it. I decided to use function factory gammaplot.ff() and the first version of my code looks like this:

PowerUtility1d <- function(x, delta = 4) {
  return(((x+1)^(1 - delta)) / (1 - delta))
}
PowerUtility1d <- Vectorize(PowerUtility1d, "x")

# function factory allows multiparametrization of PowerUtility1d()
gammaplot.ff <- function(type, gamma) {
  ff <- switch(type, 
               original = function(x) PowerUtility1d(x/10, gamma),
               pnorm_wrong = function(x) PowerUtility1d(2*pnorm(x)-1, gamma),
               pnorm_right = function(x) PowerUtility1d(2*pnorm(x/3)-1, gamma)
              )
  ff
}

gammaplot.df <- data.frame(type=numeric(), gamma=numeric(), 
                           x=numeric(), y=numeric())
gammaplot.gamma <- c(1.1, 1.3, 1.5, 2:7)
gammaplot.pts <- (-1e4:1e4)/1e3

# building the data set
for (gm in gammaplot.gamma) {
   for (tp in c("original", "pnorm_wrong", "pnorm_right")) {
     fpts <- gammaplot.ff(tp, gm)(gammaplot.pts)    
     dataChunk <- cbind(tp, gm, gammaplot.pts, fpts)
     colnames(dataChunk) <- names(gammaplot.df)
     gammaplot.df <- rbind(gammaplot.df, dataChunk)
   }
}

# rbind()/cbind() cast all data to character, but x and y are numeric
gammaplot.df$x <- as.numeric(as.character(gammaplot.df$x))
gammaplot.df$y <- as.numeric(as.character(gammaplot.df$y))

Turns out, the whole data frame contains character data, so I have to convert it back manually (took me a while to discover that in the first place!). SO search indicates that this happens because type variable is character. To avoid this (you can imagine performance issues on character data while building the data set!) I changed the code a bit:

gammaplot.ff <- function(type, gamma) {
  ff <- switch(type, 
               function(x) PowerUtility1d(x/10, gamma),
               function(x) PowerUtility1d(2*pnorm(x)-1, gamma),
               function(x) PowerUtility1d(2*pnorm(x/3)-1, gamma)
              )
  ff
}

for (gm in gammaplot.gamma) {
  for (tp in 1:3) {
    fpts <- gammaplot.ff(tp, gm)(gammaplot.pts)    
    dataChunk <- cbind(tp, gm, gammaplot.pts, fpts)
    colnames(dataChunk) <- names(gammaplot.df)
    gammaplot.df <- rbind(gammaplot.df, dataChunk)
  }
}

This works fine for me, but I lost a self-explanatory character parameter, which is a downside. Is there a way to keep the first version of function factory without an implicit conversion of all data to character?

If there's another way of achieving the same result, I'd be happy to try it out.

Upvotes: 49

Views: 61214

Answers (3)

kraggle
kraggle

Reputation: 347

If I use rbind or rbind.data.frame, the columns are turned into characters every time. Even if I use stringsAsFactors = FALSE. What worked for me was using

rbind.data.frame(df, data.frame(ColNam = data, Col2 = data), stringsAsFactors = FALSE)

Upvotes: 0

HBat
HBat

Reputation: 5702

I want to put @mtelesha 's comment to the front.

Use stringsAsFactors = FALSE in cbind or cbind.data.frame:

x <- data.frame(a = letters[1:5], b = 1:5)
y <- cbind(x, c = LETTERS[1:5])
class(y$c)
## "factor"
y <- cbind.data.frame(x, c = LETTERS[1:5])
class(y$c)
## "factor"
y <- cbind(x, c = LETTERS[1:5], stringsAsFactors = FALSE)
class(y$c)
## "character"
y <- cbind.data.frame(x, c = LETTERS[1:5], stringsAsFactors = FALSE)
class(y$c)
## "character"

UPDATE (May 5, 2020):

As of R version 4.0.0, R uses a stringsAsFactors = FALSE default in calls to data.frame() and read.table().

https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/

Upvotes: 8

shadow
shadow

Reputation: 22343

You can use rbind.data.frame and cbind.data.frame instead of rbind and cbind.

Upvotes: 100

Related Questions