Massive_Shed
Massive_Shed

Reputation: 47

Sum of data frame's rows in range defined by columns

I have an integer based dataframe with positional coordinates in one column and a variable in the second. The coordinates range from 1-10 million, the variables from 0-950 - I'm interested in returning the sum of the variables from ranges defined within a separate frame containing the start and end points of the desired range.

To make things a bit easier to compute I've shortened the example:

Data:
a = seq(1,5)
b = c(0,0,1,0,2)
df1 <- data.frame(a, b)

c = c(1,1,2,2,3)
d = c(3,4,3,5,4)
df2 <- data.frame(c,d)

df1:
1, 0
2, 0
3, 1
4, 0
5, 2

df2:
1, 3
1, 4
2, 3
2, 5
3, 4

magic

output:
1,
1,
1,
3,
1,

Where magic is pulling the start and end positions in df2 columns 1 and 2 to pass to rowSums for df1 extraction.

Upvotes: 1

Views: 79

Answers (2)

YOLO
YOLO

Reputation: 21709

Edit: @Frank's data.table solution: short and fast.

df2[, s := df1[df2, on=.(a >= c, a <= d), sum(b), by=.EACHI]$V1]

    # output
       c d s
    1: 1 3 1
    2: 1 4 1
    3: 2 3 1
    4: 2 5 3
    5: 3 4 1

Another way (may be slower but works):

library(data.table)
setDT(df1)
setDT(df2)

## magic function
get_magic <- function(x)
{
    spell <- c()

    one <- unlist(x[1])
    two <- unlist(x[2])

    a <- df1[between(a, one, two), sum(b)]
    spell <- append(spell, a)

    return(spell)

}


# applies to row
d <- apply(df2, 1, get_magic)

print(d)
# output
[1] 1 1 1 3 1

Upvotes: 1

MKR
MKR

Reputation: 20085

One possible solution is by using mapply. I have used a custom function but one can write an inline function as part of mapply statement.

mapply(row_sum, df2$c, df2$d)

row_sum <- function(x, y){
  sum(df1[x:y,2])
}

#Result
#[1] 1 1 1 3 1

Data

a = seq(1,5)
b = c(0,0,1,0,2)
df1 <- data.frame(a, b)

c = c(1,1,2,2,3)
d = c(3,4,3,5,4)
df2 <- data.frame(c,d)

Upvotes: 0

Related Questions