Reputation: 1
My problem is in R I start from a dataframe, where I have 2 variables z and p (p are the weights) I need this sum
∑_i ∑_j ((z_i - z_j)·p_i·p_j·I_z)
Where I_z is an indicator, if z_i < z_j it is = -1, =1 otherwise please consider that the data are big, dataframe could have also 10000 rows I try with matrix but I have a problem of memory I think to be obliged to use for loops... any suggestion ? thank you Elena
Upvotes: 0
Views: 65
Reputation: 132969
Your "indicator" is just a fancy way of defining the abs
function.
You can use outer
is you have sufficient RAM:
set.seed(2)
n <- 2
DF <- data.frame(z=sample(1:2, n, TRUE),
p=sample(1:2, n, TRUE))
# z p
#1 1 2
#2 2 1
sum(outer(seq_len(nrow(DF)), seq_len(nrow(DF)), function(i, j) {
abs(DF$z[i] - DF$z[j]) * DF$p[i] * DF$p[j]
}))
#[1] 4
n <- 1e4
DF <- data.frame(z=sample(1:2, n, TRUE),
p=sample(1:2, n, TRUE))
sum(outer(seq_len(nrow(DF)), seq_len(nrow(DF)), function(i, j) {
abs(DF$z[i] - DF$z[j]) * DF$p[i] * DF$p[j]
}))
#[1] 112224330
If you don't, you need a loop. Using combn
is one possibility, but it is slow since it is basically a loop:
2 * sum(combn(seq_len(nrow(DF)), 2, function(ind) {
abs(z[ind[1]] - z[ind[2]]) * p[ind[1]] * p[ind[2]]
}))
#[1] 112224330
Upvotes: 2