Lucas
Lucas

Reputation: 1177

With R, iterate over data frames, perform math operations on each one, append results in new data frame

I have a huge data frame (A) in R that looks like this:

Letters   Frequency   Numbers
a         0.15        1
b         0.67        2
c         0.85        7
d         0.4         3

I would like to first divide A into 4 data frames according to the range of values of the "Frequency" column (4 frequency bins of size 0.25), which go from 0 to 1, so that I obtain:

A1 [0, 0.25]

Letters   Frequency   Numbers
a         0.15        1

A2 [0.25, 0.5]

Letters   Frequency   Numbers
d         0.4         3

A3 [0.5, 0.75]

Letters   Frequency   Numbers
b         0.67        2

A4 [0.75, 1]

Letters   Frequency   Numbers
c         0.85        7

In an iterative way and following the order of the frequency bins, I would like to perform math operations in A1, A2, A3 and A4 (e.g., op1 = Numbers - 2; op2 = Numbers * 10) and create a data frame B with the appended results:

B

bin            op1    op2   
[0, 0.25]      -1     10 
[0.25, 0.5]     1     30 
[0.5, 0.75]     0     20 
[0.75, 1]       5     70

I guess I don't need to create A1, A2, A3 and A4 (ideally I would like to work only with A) and that there is a more elegant way of getting B by iterating through the frequency bins directly, but I created them in this example to clearly explain the principle. I think this can be done with lapply, but I am not sure how. Thank you very much in advance.

Upvotes: 2

Views: 559

Answers (2)

lmo
lmo

Reputation: 38500

A base R method with the split-apply-combine methodology is

do.call(rbind, lapply(split(dd, findInterval(dd$Frequency, c(0, .25, .5, .75, 1))),
                      function(i) within(i, { # create variables, remove Numbers
                                         op1 <- Numbers - 2
                                         op2 <- Numbers * 10
                                         Numbers <- NULL})))

This returns

  Letters Frequency op2 op1
1       a      0.15  10  -1
2       d      0.40  30   1
3       b      0.67  20   0
4       c      0.85  70   5

Here, findInterval bins the observations based on Frequency, split splits the data.frame based on these bins and orders the data accordingly, lapply and within are used to construct the new variables for each group.

Upvotes: 2

MrFlick
MrFlick

Reputation: 206232

You can do this with a simple group_by in dplyr. For example with your data.

dd<-read.table(text="Letters   Frequency   Numbers
a         0.15        1
b         0.67        2
c         0.85        7
d         0.4         3", header=T)

You can run

library(dplyr)

dd %>% 
    group_by(bin=cut(Frequency, breaks=seq(0,1,by=.25))) %>% 
    transmute(op1=Numbers-2, op2=Numbers*10) %>% 
    arrange(bin)

We use cut() to create the binning groups and then use transmute() to create the new columns (while dropping the old).

Upvotes: 2

Related Questions