Nelviticus
Nelviticus

Reputation: 151

Apply function to chunks of a data frame

I'm a C# programmer who's been asked to do some work in R. I need to figure out how to call a function multiple times passing in 'chunks' of a data frame; for all rows where the first two columns are distinct I need to call the function once.

Here's what I mean:

Stratum<-c("FPN", "FPN", "FPN", "MPN", "MPN", "MPN")
Cal<-c("ynnn", "ynnn", "yynn", "ynnn", "ynnn", "yynn")
Band.1<-c(1,2,1,1,2,1)
Band.2<-c(2,3,2,2,3,2)
Regroup<-c("No","Yes","No","Yes","No","No")
decs.data<-data.frame(Stratum,Cal,Band.1,Band.2,Regroup,stringsAsFactors=FALSE)

Stratum  Cal Band.1 Band.2 Regroup
    FPN ynnn      1      2      No
    FPN ynnn      2      3     Yes
    FPN yynn      1      2      No
    MPN ynnn      1      2     Yes
    MPN ynnn      2      3      No
    MPN yynn      1      2      No

For the above data I'd call the function four times - once passing it all the rows of decs.data where Stratum="FPN" and Cal="ynnn", then where Stratum="FPN" and Cal="yynn" and so on.

The function won't operate on those rows, it uses them to determine which data file to load from disc and what to do with it.

How would I go about calling a function this way in R? I'm sure 'apply' must be involved but I'm struggling to figure out how.

UPDATE: I don't need all the rows in the data.frame as arguments to the function, just the matching ones (i.e. rows 1 & 2 for the 1st call, 3 for the 2nd, 4 & 5 for the 3rd and 6 for the 5th).

The function will load a data file based on the Stratum & Cal columns (e.g. FPN.ynnn.rdata) then decide how to process it based on the Band.1, Band.2 and Regroup columns.

Essentially, decs.data is not the data I want to manipulate but a decisions matrix defining which bands in which rdata files need to be regrouped.

Upvotes: 1

Views: 1047

Answers (1)

nograpes
nograpes

Reputation: 18323

You are looking for by. If you want to run your function on subsets of the decs.data, using Stratum and Cal as the splitting variable, you can do:

by(decs.data,decs.data[c('Stratum','Cal')],function)

where function is your function.

Upvotes: 4

Related Questions