Reputation: 770
I have a data frame like this:
Col-1: id.
Col-2: ranges from 0 to 100.
Col-3: value.
id col-2 value
...
id 10.00 2
id 10.53 2
id 11.11 88
id 11.76 6
id 12.00 2
id 12.12 2
id 12.35 163
id 12.50 6
id 12.90 2
id 13.33 5
id 13.58 366
id 13.64 8
id 14.29 10
id 14.81 725
...
id 100 45
I want to make 100 bins of Col-2, and sum up values in Col-3 in that interval. How can I do that? For example output would be something like this:
id 0-1 sum-value-in-interval
id 1-2 sum-value-in-interval
id 2-3 sum-value-in-interval
...
id 10-11 4
id 11-12 94
...
id 99-100 sum-value-in-interval
Thanks for the help!
Upvotes: 2
Views: 2930
Reputation: 887118
We can use cut
to create a grouping variable, use that in aggregate
to get the sum
of 'col2'.
df1$group <- as.character(cut(df1$col2, breaks=1:100))
aggregate(col3~group+id, df1, FUN=sum)
Or this can be done with data.table
library(data.table)
setDT(df1)[, group:= cut(col2, breaks=1:100)
][,list(col3= sum(col3)) ,.(group, id)]
set.seed(24)
df1 <- data.frame(id= paste0('id', rep(1:2, each=50)),
col2=rnorm(100, sample(100)), col3= sample(500, 100, replace=TRUE))
Upvotes: 5
Reputation: 2166
This is a dplyr
based solution. Let your data be called dat
:
library(dplyr)
dat%>%mutate(quantile = ntile(col2,100))%>%group_by(quantile)%>%summarize(sumValueInInterval = sum(col3))
Upvotes: 6