Reputation: 47
I have an integer based dataframe with positional coordinates in one column and a variable in the second. The coordinates range from 1-10 million, the variables from 0-950 - I'm interested in returning the sum of the variables from ranges defined within a separate frame containing the start and end points of the desired range.
To make things a bit easier to compute I've shortened the example:
Data:
a = seq(1,5)
b = c(0,0,1,0,2)
df1 <- data.frame(a, b)
c = c(1,1,2,2,3)
d = c(3,4,3,5,4)
df2 <- data.frame(c,d)
df1:
1, 0
2, 0
3, 1
4, 0
5, 2
df2:
1, 3
1, 4
2, 3
2, 5
3, 4
magic
output:
1,
1,
1,
3,
1,
Where magic is pulling the start and end positions in df2 columns 1 and 2 to pass to rowSums for df1 extraction.
Upvotes: 1
Views: 79
Reputation: 21709
Edit: @Frank's data.table solution: short and fast.
df2[, s := df1[df2, on=.(a >= c, a <= d), sum(b), by=.EACHI]$V1]
# output
c d s
1: 1 3 1
2: 1 4 1
3: 2 3 1
4: 2 5 3
5: 3 4 1
Another way (may be slower but works):
library(data.table)
setDT(df1)
setDT(df2)
## magic function
get_magic <- function(x)
{
spell <- c()
one <- unlist(x[1])
two <- unlist(x[2])
a <- df1[between(a, one, two), sum(b)]
spell <- append(spell, a)
return(spell)
}
# applies to row
d <- apply(df2, 1, get_magic)
print(d)
# output
[1] 1 1 1 3 1
Upvotes: 1
Reputation: 20085
One possible solution is by using mapply
. I have used a custom function but one can write an inline function as part of mapply
statement.
mapply(row_sum, df2$c, df2$d)
row_sum <- function(x, y){
sum(df1[x:y,2])
}
#Result
#[1] 1 1 1 3 1
a = seq(1,5)
b = c(0,0,1,0,2)
df1 <- data.frame(a, b)
c = c(1,1,2,2,3)
d = c(3,4,3,5,4)
df2 <- data.frame(c,d)
Upvotes: 0