Reputation: 1177
I have a huge data frame (A) in R that looks like this:
Letters Frequency Numbers
a 0.15 1
b 0.67 2
c 0.85 7
d 0.4 3
I would like to first divide A into 4 data frames according to the range of values of the "Frequency" column (4 frequency bins of size 0.25), which go from 0 to 1, so that I obtain:
A1 [0, 0.25]
Letters Frequency Numbers
a 0.15 1
A2 [0.25, 0.5]
Letters Frequency Numbers
d 0.4 3
A3 [0.5, 0.75]
Letters Frequency Numbers
b 0.67 2
A4 [0.75, 1]
Letters Frequency Numbers
c 0.85 7
In an iterative way and following the order of the frequency bins, I would like to perform math operations in A1, A2, A3 and A4 (e.g., op1 = Numbers - 2; op2 = Numbers * 10) and create a data frame B with the appended results:
B
bin op1 op2
[0, 0.25] -1 10
[0.25, 0.5] 1 30
[0.5, 0.75] 0 20
[0.75, 1] 5 70
I guess I don't need to create A1, A2, A3 and A4 (ideally I would like to work only with A) and that there is a more elegant way of getting B by iterating through the frequency bins directly, but I created them in this example to clearly explain the principle. I think this can be done with lapply, but I am not sure how. Thank you very much in advance.
Upvotes: 2
Views: 559
Reputation: 38500
A base R method with the split-apply-combine methodology is
do.call(rbind, lapply(split(dd, findInterval(dd$Frequency, c(0, .25, .5, .75, 1))),
function(i) within(i, { # create variables, remove Numbers
op1 <- Numbers - 2
op2 <- Numbers * 10
Numbers <- NULL})))
This returns
Letters Frequency op2 op1
1 a 0.15 10 -1
2 d 0.40 30 1
3 b 0.67 20 0
4 c 0.85 70 5
Here, findInterval
bins the observations based on Frequency, split
splits the data.frame based on these bins and orders the data accordingly, lapply
and within
are used to construct the new variables for each group.
Upvotes: 2
Reputation: 206232
You can do this with a simple group_by
in dplyr. For example with your data.
dd<-read.table(text="Letters Frequency Numbers
a 0.15 1
b 0.67 2
c 0.85 7
d 0.4 3", header=T)
You can run
library(dplyr)
dd %>%
group_by(bin=cut(Frequency, breaks=seq(0,1,by=.25))) %>%
transmute(op1=Numbers-2, op2=Numbers*10) %>%
arrange(bin)
We use cut()
to create the binning groups and then use transmute()
to create the new columns (while dropping the old).
Upvotes: 2