jkebinger
jkebinger

Reputation: 4154

Get and process entire row in ddply in a function

It's easy to grab one or more in ddply to process, but is there a way to grab the entire current row and pass that onto a function? Or to grab a set of columns determined at runtime?

Let me illustrate:

Given a dataframe like

df = data.frame(a=seq(1,20), b=seq(1,5), c= seq(5,1))
df
    a b c
1   1 1 5
2   2 2 4
3   3 3 3

I could write a function to sum named columns along a row of a data frame like this:

selectiveSummer = function(row,colsToSum) {
   return(sum(row[,colsToSum])) 
}

It works when I call it for a row like this:

> selectiveSummer(df[1,],c('a','c'))
[1] 6

So I'd like to wrap that in an anonymous function and use it in ddply to apply it to every row in the table, something like the example below

f = function(x) { selectiveSummer(x,c('a','c')) }
#this doesn't work!
ddply(df,.(a,b,c), transform, foo=f(row))

I'd like to find a solution where the set of columns to manipulate can be determined at runtime, so if there's some way just to splat that from ddply's args and pass it into a function that takes any number of args, that works too.

Edit: To be clear, the real application driving this isn't sum, but this was an easier explanation

Upvotes: 3

Views: 3229

Answers (2)

Nicholas Hamilton
Nicholas Hamilton

Reputation: 10506

Simple...

df$id = 1:nrow(df)
ddply(df,c('id'),function(x){ ... })

OR

adply(df,1,function(x){ ... })

Upvotes: 0

GaBorgulya
GaBorgulya

Reputation: 617

You can only select single rows with ddply if rows can be identified in a unique way with one or more variables. If there are identical rows ddply will cycle over data frames of multiple rows even if you use all columns (like ddply(df, names(df), f).

Why not use apply instead? Apply does iterate over individual rows.

apply(df, 1, function(x) f(as.data.frame(t(x)))))

result:

[1]  6  6  6  6  6 11 11 11 11 11 16 16 16 16 16 21 21 21 21 21

Upvotes: 4

Related Questions