With R, iterate over data frames, perform math operations on each one, append results in new data frame

Question

I have a huge data frame (A) in R that looks like this:

Letters   Frequency   Numbers
a         0.15        1
b         0.67        2
c         0.85        7
d         0.4         3

I would like to first divide A into 4 data frames according to the range of values of the "Frequency" column (4 frequency bins of size 0.25), which go from 0 to 1, so that I obtain:

A1 [0, 0.25]

Letters   Frequency   Numbers
a         0.15        1

A2 [0.25, 0.5]

Letters   Frequency   Numbers
d         0.4         3

A3 [0.5, 0.75]

Letters   Frequency   Numbers
b         0.67        2

A4 [0.75, 1]

Letters   Frequency   Numbers
c         0.85        7

In an iterative way and following the order of the frequency bins, I would like to perform math operations in A1, A2, A3 and A4 (e.g., op1 = Numbers - 2; op2 = Numbers * 10) and create a data frame B with the appended results:

B

bin            op1    op2   
[0, 0.25]      -1     10 
[0.25, 0.5]     1     30 
[0.5, 0.75]     0     20 
[0.75, 1]       5     70

I guess I don't need to create A1, A2, A3 and A4 (ideally I would like to work only with A) and that there is a more elegant way of getting B by iterating through the frequency bins directly, but I created them in this example to clearly explain the principle. I think this can be done with lapply, but I am not sure how. Thank you very much in advance.

lmo · Accepted Answer

A base R method with the split-apply-combine methodology is

do.call(rbind, lapply(split(dd, findInterval(dd$Frequency, c(0, .25, .5, .75, 1))),
                      function(i) within(i, { # create variables, remove Numbers
                                         op1 <- Numbers - 2
                                         op2 <- Numbers * 10
                                         Numbers <- NULL})))

This returns

  Letters Frequency op2 op1
1       a      0.15  10  -1
2       d      0.40  30   1
3       b      0.67  20   0
4       c      0.85  70   5

Here, findInterval bins the observations based on Frequency, split splits the data.frame based on these bins and orders the data accordingly, lapply and within are used to construct the new variables for each group.

With R, iterate over data frames, perform math operations on each one, append results in new data frame

Answers (2)

Related Questions