ipetrik
ipetrik

Reputation: 2054

Using column name passed to function in ggplot aes, data.frame, and nls

OK, let's say I have the following data in a CSV file ("example_data.csv"):

Likelihood,Weight,Par1,Par2,Par3
0.186844384,0.036923697,2,2,58
0.533218654,0.501397958,0,0,65
0.242303977,0.003077206,1,1,46
0.345092541,0.444826685,2,2,23
0.293672855,0.108440953,2,3,29
0.287151901,0.788640671,2,2,45
0.662063373,0.995332406,-1,-2,71
0.515526137,0.089007922,-1,-1,110
0.330131798,0.419704507,1,1,43
0.340537446,0.384904805,-1,-1,78
0.42350387,0.817862511,0,0,94
0.278387583,0.912293985,1,2,53
0.413520775,0.465414836,1,1,56
0.111797213,0.276860883,3,3,26
0.420515164,0.642712917,1,1,68
0.30835086,0.882109026,1,1,24
0.576850063,0.518219853,0,-2,81
0.355660735,0.790567044,0,0,29
0.979357518,0.039895315,-4,-4,177
0.656909082,0.404682824,-2,-4,101
0.48684488,0.488388762,-2,-3,144
0.806577308,0.530345186,-2,-3,143
0.658578518,0.970476957,-2,-5,160
0.521646556,0.723287454,2,3,83
0.60702761,0.727149894,-2,-4,155
0.694971183,0.071413935,3,4,22
0.351835995,0.98549942,-1,-1,81
0.916744944,0.867929188,-1,-2,91
0.646122983,0.395781956,-1,-2,95
0.292583756,0.907615016,-1,-1,89
0.500997719,0.7635543,-2,-4,142
0.827681213,0.094512069,-2,-5,149
0.904759491,0.374158994,-3,-4,97
0.783803411,0.962195178,-3,-4,102
0.382691023,0.41835611,0,0,21
0.290186245,0.842489929,2,2,10
0.417623103,0.413883742,-3,-4,145
0.813249374,0.265328688,-2,-3,102
0.882071817,0.817630957,-2,-4,99
0.849050068,0.101411688,-2,-2,61
0.390254013,0.637964495,1,1,22
0.243507734,0.070444932,2,3,15
0.259785717,0.501507883,2,2,5
0.685399514,0.347204068,-3,-5,152
0.483162564,0.724026851,-3,-4,121
0.828930794,0.71894471,0,-1,50
0.282705441,0.551101402,1,1,21
0.09732417,0.113851154,3,4,29
0.22818404,0.000950461,1,1,32
0.132510088,0.654162829,0,0,58
0.229581317,0.099388171,1,2,99
0.768479467,0.014822263,-2,-3,126
0.572649738,0.465394695,-1,-1,107
0.195123412,0.677059169,0,0,64
0.602264748,0.128128995,-1,-1,112
0.566370697,0.454819417,-3,-5,180
0.962733978,0.909347539,-5,-3,215
0.762192377,0.840566094,-3,-4,194
0.909048091,0.146816754,-2,-4,205
0.411053888,0.199181775,-1,-2,38
0.262232454,0.144137241,-1,-1,74
0.437649773,0.583755593,-1,-2,76
0.71896061,0.147700762,-2,-3,103
0.697941592,0.080480032,-2,-3,77
0.500277498,0.649807717,-3,-4,98
0.437533815,0.006917082,-1,-1,27
0.276252625,0.776412941,0,0,56
0.660321112,0.516544613,-1,-2,94
0.396011967,0.1709671,-2,-3,98
0.539238702,0.703846181,-2,-3,125
0.998578074,0.106352132,-2,-4,184
0.552325405,0.970471559,-3,-5,109
0.380106473,0.948651389,0,0,60
0.887789916,0.328624317,-3,-4,159

which I load into a data frame by the standard means:

dat <- read.csv("example_data.csv")

I'm trying to write a function that will, for a given column name, compute a nls fit, and plot the fit and the data using the given column value as the x value (with a bit of jitter " + runif(10,-0.1,0.1)" to relieve overlap)

plotfun <- function (data, parameter) {
  start <- getInitial(Likelihood~SSlogis(substitute(parameter),alpha,xmid,scale),data)
  m <- nls(Likelihood~1/(1+exp((xmid-substitute(parameter))/scale)), start=start[c(2,3)], data=data, weight=Weight)

  pred <- data.frame(substitute(parameter)=seq(min(data$parameter),max(data$parameter),length.out=100))
  pred$y <- predict(m, newdata=pred)

  p <- ggplot (data, aes_q (y=~Likelihood, x=substitute(parameter+runif(10,-0.1,0.1))))
  p + geom_point(size = 1) + geom_line(data=pred, aes_q(x=substitute(parameter),y=~y))
}

plotfun(dat, Par1)

But this fails... Basically, I don't understand when I'm supposed to use the bare variable name and where I'm supposed to use substitute, or some other function I'm not aware of.

Can someone please explain how to write this function properly?

Upvotes: 1

Views: 103

Answers (2)

MrFlick
MrFlick

Reputation: 206253

Here's another answer where you just pass in a string

plotfun <- function (data, parameter) {
  data$.var. <- data[,parameter]

  start <- getInitial(Likelihood~SSlogis(.var.,alpha,xmid,scale),data)
  m <- nls(Likelihood~1/(1+exp((xmid-.var.)/scale)), start=start[c(2,3)], data=data, weight=Weight)

  pred <- data.frame(.var. = seq(min(data[,parameter]),max(data[,parameter]),length.out=100))
  pred$y <- predict(m, newdata=pred)

  p <- ggplot (data, aes(y=Likelihood, x=.var.+runif(74,-0.1,0.1)))
  p + geom_point() + geom_line(data=pred, aes(x=.var., y=y)) + xlab(parameter)
}
library(ggplot2)
plotfun(dat, "Par1")

We just make a column named .var. to make most of the coding much easier, and just change the x label at the end.

Upvotes: 1

MrFlick
MrFlick

Reputation: 206253

R does not do text-based substitution macros like SAS does or the C compiler. When you need to build expressions you need to make sure they are of the right type so R knows which values to evaluate and which not to. If you have a bunch of places you want to replace a certain symbol with another symbol, then you can use substitute. Here's a re-write of your function.

plotfun <- function (data, parameter) {
  p <- substitute(parameter)
  expr <- substitute({
    start <- getInitial(Likelihood~SSlogis(parameter,alpha,xmid,scale),data)
    m <- nls(Likelihood~1/(1+exp((xmid-parameter)/scale)), start=start[c(2,3)], data=data, weight=Weight)

    pred <- setNames(data.frame(seq(min(data$parameter),max(data$parameter),length.out=100)), as.character(expression(parameter)))
    pred$y <- predict(m, newdata=pred)

    p <- ggplot (data, aes(y=Likelihood, x=parameter+runif(74,-0.1,0.1)))
    p + geom_line(data=pred, aes(x=parameter,y=y))
  }, list(parameter=p))
  eval(expr)
}

Since you want to perform non-standard evaluation by passing an unevaulated symbol to your function, you need to so some extra work. Here we use substitute() on the parameter parameter to capture the symbol that's in that parameter's promise. Then we use substitute() to replace all the occurrences in a block of code of parameter with whatever you passed in. Then we eval() that new code block.

There one weird thing is there is that you named arguments of functions (as in the a of data.frame(a=1) aren't proper symbols in the way that substitute() would see them. They are named parameters. So we essentially depress the symbol we've passed in and use setNames() with that character value to make it work.

So basically I just used substitute twice, once to capture the unevaluated symbol passed to the function, and then to re-write the code in a block. Then I also just used aes() rather than aes_q()

As easier approach probably would have been to pass in the column names as a string. There are often better alternatives for dynamically building code with character values than symbols.

Upvotes: 1

Related Questions