Pierre D
Pierre D

Reputation: 26201

How to use apply to generate a data frame row by row?

I want to generate a dataframe row by row, by using some flavor of apply on a list of values and a function that returns a single-row data frame for each value. As a toy example, suppose that my values are i = 1:3 and that I have:

f <- function(i) {
    return(data.frame(img=letters[i], cached=F, i=i, stringsAsFactors=F))
}

I've been messing around with sapply, lapply, a bunch of transpose etc. with no success (for example, d = sapply(1:3, f) looks promising, but appears to be the transpose of what I want, so I tried d = t(sapply(1:3,f)), except that it is a matrix; I therefore tried next d = as.data.frame(t(sapply(1:3, f))), which appears right (it prints out just like what I want), but is still wrong, as you'd find out if you try to subset it e.g. d[,1] which is in fact a list).

Finally I got this, which works:

d = apply(data.frame(i=1:3), 2, f)$i

That gives me the frame I wanted:

  img cached i
1   a  FALSE 1
2   b  FALSE 2
3   c  FALSE 3

Is there a better/cleaner way to express the above? It all feels pretty kludgy and overly complicated to me.


Edit: as mentioned by several readers, this "toy example" is admitedly too simple, and indeed just f(1:3) would do what it looks like I am requesting. The actual function is part of a web-based metrics dashboard, draws data from various DB tables, and makes moderately complex plots which I intend to cache (most of the time they change relatively slowly). The relevant part, I guess, is that the function typically takes several arguments, and those arguments aren't a simple sequence 1:n. So, let me rewrite the example to be a tad more realistic:

library(digest)
gkey   <- function(...) {
  args <- list(...)
  return(digest(paste(args,sep=".",collapse=".")));
}

f <- function(conn, table, checknew.query, plot.query, plot.fun, params) {
  latest.data = queryExec(conn, table, checknew.query, params)
  key = gkey(table, latest.data, plot.query, plot.fun, params)
  out = getFromCacheOrPlot(key, conn, table, plot.query, plot.fun, params)
  return(out)
}

where queryExec builds a query, executes it and retrieves the results, gkey() computes a hash key based on its parameters, getFromCacheOrPlot() uses the key to build a file name (a .png image), retrieves it from cache if it exists, or generates it otherwise. It also returns a data.frame with one row giving us the file name, an html <img=...> blurb to display it, whether the plot was in or out of cache, and which parameters were used for the plot.

All this is used in a plugin for a wiki system, and certain pages have dozen of plots or more.

Upvotes: 7

Views: 5978

Answers (2)

Justin
Justin

Reputation: 43255

do.call(rbind, lapply(i, f)) will do what you're asking... but so would:

data.frame(img=letters[i], cached=F, i=i, stringsAsFactors=F)

As would:

f(i)

Upvotes: 8

Jilber Urbina
Jilber Urbina

Reputation: 61154

What about this? No need to use any flavor of apply functions

foo <- function(x){
  i <- seq_len(x)
  data.frame(img=letters[i], cached=FALSE, i=i, stringsAsFactors=F)
}


  foo(5)
  img cached i
1   a  FALSE 1
2   b  FALSE 2
3   c  FALSE 3
4   d  FALSE 4
5   e  FALSE 5

Upvotes: 3

Related Questions