kjo
kjo

Reputation: 35311

Apply function to heterogeneous rows of data.frame

Suppose that I have some (non-vectorized) function foo:

foo <- function (bar, baz, frobozz, frotz = 42) {
    if (frobozz) {
        frotz
    }
    else {
        bar * nchar(baz)
    }
}

It's a silly function, no doubt, but for the purpose of this question, take as a given. (IOW, answers predicated on modifying foo are out of bounds.)

Also, suppose that I have the data.frame df, as shown below:

> df
  frobozz bar baz
1    TRUE   1   a
2   FALSE   2   b
3    TRUE   3   c
4   FALSE   4   d
5    TRUE   5   e

Now, each row of df can be regarded as a heterogenous named list (which I will henceforth abbreviate as record).

In fact, it's not difficult to cast any of df's rows as such a record:

> df[1, , drop = TRUE]
$frobozz
[1] TRUE

$bar
[1] 1

$baz
[1] "a"

Moreover, the value in such a record for any of its named slots is of a type suitable as the argument of the same name in foo's signature.

This means that I can use do.call to apply foo to any single row of df:

> do.call(foo, df[1, , drop = TRUE])
[1] 42
> do.call(foo, df[2, , drop = TRUE])
[1] 2

(Note that this works even though the ordering of df's columns and the ordering of foo's required arguments do not match.)

Now, I would like create a new column by applying foo to every row of df.

I had hoped that apply would be up to the task, but it fails:

> apply(df, 1, foo)
Error in FUN(newX[, i], ...) : 
  argument "frobozz" is missing, with no default

Of course, I could resort to something like this:

sapply(1:nrow(df), function (i) { do.call(foo, df[i, , drop = TRUE]) })

Is there a less ignorant-looking way to achieve this?


Here's a variation of this question that may be more tractable.

Consider the function foo_wrapper:

foo_wrapper <- function ( record ) {
    foo( record$bar, record$baz, record$frobozz )
}

This function is more flexible than foo, because all it requires is that its argument, record, have elements named bar, baz, and frobozz; it doesn't care about any other elements it may have. Also, one can apply foo_wrapper directly to df's rows, without having to resort to do.call:

> foo_wrapper(df[4, , drop = TRUE])
[1] 4

Unfortunately, apply fails with foo_wrapper as well:

> apply(df, 1, foo_wrapper)
Error in record$frobozz : $ operator is invalid for atomic vectors

Upvotes: 2

Views: 154

Answers (1)

MrFlick
MrFlick

Reputation: 206232

You can just Vectorize your function and then use with() to access the variables. For example your sample data...

dd <- read.table(text="frobozz bar baz
1    TRUE   1   a
2   FALSE   2   b
3    TRUE   3   c
4   FALSE   4   d
5    TRUE   5   e", header=T, stringsAsFactors=F)

Then you can run

with(dd, Vectorize(foo)(frobozz=frobozz, bar=bar, baz=baz))
# [1] 42  2 42  4 42

Upvotes: 2

Related Questions