Reputation: 3458
This comes as an application to this question:Sum object in a column between an interval defined by another column
What I would like to know is how to adjust the answer if I want to sum the values in B, for ((A[i+1]-A[i]==0)
or (A[i+1]-A[i]==1)
or (A[i]-A[i-1]==0)
or (A[i]-A[i-1]==1))
where i
is the row index, so basically sum B rows for A-s that have the same value +/- 1, but not sum the same row twice?
I tried building a loop function but I get stuck when using row indices with data frames. Example: If the following data frame is given
df
A B
[1,] 1 4
[2,] 1 3
[3,] 3 5
[4,] 3 7
[5,] 4 3
[6,] 5 2
What I want to obtain is the next data frame:
df
A B
[1,] 1 7
[2,] 3 15
[3,] 5 2
Moreover if a have a large data frame like this:
df
chr start stop m n s
chr1 71533361 71533362 23 1 -
chr1 71533361 71533362 24 26 -
chr1 71533361 71533362 25 1 -
and I want my result to look like this (I chose the row for which the value in column m is max):
df
chr1 71533361 71533362 24 28 -
Upvotes: 0
Views: 1847
Reputation: 44527
Try the following, assuming your original dataframe is df
:
df2 <- df # create a duplicate df to destroy
z <- data.frame(nrow=length(unique(df$A)), ncol=2) # output dataframe
names(z) <- c("A","B")
j <- 1 # output indexing variable
u <- unique(df$A) # unique vals of A
i <- u[1]
s <- TRUE # just for the while() loop
while(s){
z[j,] <- c(i,sum(df2[df2$A %in% c(i-1,i,i+1),2]))
df2 <- df2[!df2$A %in% c(i-1,i,i+1),]
j <- j + 1 # index the output
u <- u[!u %in% c(i-1,i,i+1)] # cleanup the u vector
if(length(u)==0) # conditionally exit the loop
s <- FALSE
else
i <- min(u) # reset value to sum by
}
I know that's kind of messy code, but it's a sort of tough problem given all of the different indices.
Upvotes: 1
Reputation: 1
I would create a for loop that tests whether A[i] - A[i-1] meets your criteria.
If that is true it adds b[i] to a sum variable and repeats its way through.
Because i is just iterating through A[] it shouldn't count anything from B[] twice.
Upvotes: 0