Reputation: 325
I want to sum up numbers by blocks:
Here is a sample data
data=matrix(c(0,0,0,1,1,0,1,1,1,1,1,0,0,1,0,0,1.2,2.3,1.3,1.5,2.5,2.1,2.3,1.2),
ncol=3,dimnames=list(c(),c("low","high","time")))
low high time
[1,] 0 1 1.2
[2,] 0 1 2.3
[3,] 0 1 1.3
[4,] 1 0 1.5
[5,] 1 0 2.5
[6,] 0 1 2.1
[7,] 1 0 2.3
[8,] 1 0 1.2
I want to get
n sum
[1,] 3 4.8
[2,] 2 4
[3,] 1 2.1
[4,] 2 3.5
without using any package. How to do that with R?
Or if I can get
n/low n/high sum
[1,] 0 3 4.8
[2,] 2 0 4
[3,] 0 1 2.1
[4,] 2 0 3.5
Upvotes: 2
Views: 645
Reputation: 325
I also find a similar option:
aggregate(df,list(c(0,cumsum(abs(diff(df$low))))),sum)[-1]
For me it is more straightforward to understand.
Upvotes: 0
Reputation: 23
I have solved the problem, I think that is a little bit complicated but it works¡¡.
Well, I have generated every column using loops.
1) I have count every change
data<-data.frame(data)
ind1<-vector(mode="numeric", length=0)
ind1[1]<-1
for(i in c(2:8))
ind[i]<-ifelse(data[i,1:2]==data[i-1,1:2],ind1[i-1],ind1[i-1]+1)
Then I have generated the sum with loops also.
ind<-c(1.2,0,0,0)
k<-1
for(i in c(2:8)){
if(data[i,1:2]==data[i-1,1:2]){
ind2[k]<-ind2[k]+data[i,3]
}else{
k<-k+1
ind2[k]<-ind2[k]+data[i,3]
}}
result<-cbind(data.frame(table(ind1))$Freq,ind2)
However I have gotten some warnings, but I think that is not a problem.
Upvotes: 0
Reputation: 28451
Not sure why the constraint on packages. They can make this much easier. We can create an index by using the unique combinations of the first two columns. Then aggregate with the index for grouping. Add a line for setting the names up and data frame structure:
ind <- with(rle(do.call(paste, df1[1:2])), rep(1:length(values), lengths))
a <- aggregate(df1$time, list(ind), function(x) c(length(x), sum(x)))[-1]
setNames(do.call(data.frame, a), c("n", "sum"))
n sum
1 3 4.8
2 2 4.0
3 1 2.1
4 2 3.5
To illustrate how simple it is with help from data.table
:
library(data.table)
setDT(df1)[, .(.N, sum(time)), by=rleid(low, high)]
Update
For follow-up question, see @bgoldst answer in comments.
Upvotes: 9
Reputation: 180977
A similar option, also using aggregate;
aggregate(cbind(n=1,sum=df$time),
by=list(c(0, cumsum(abs(diff(df$low))))),
FUN=sum)[-1]
Upvotes: 3