Reputation: 1361
I have the following data.table
named dt
set.seed(1)
dt <- data.table(expand.grid(c("a","b"),1:2,1:2,c("M","N","O","P","Q")))
dt$perf <- rnorm(nrow(dt),0,.01)
colnames(dt) <- c("ticker","par1","par2","row_names","perf")
My goal is to iterate through all combinations of par1
and par2
by row_names
and pick the one that maximizes cumprod(mean(perf)+1)-1
.
Let's look at the data so this makes more sense visually.
dt[order(row_names,ticker,par1,par2)]
ticker par1 par2 row_names perf
1: a 1 1 M 0.011462284
2: a 1 2 M -0.004252677
3: a 2 1 M 0.005727396
4: a 2 2 M -0.003892372
5: b 1 1 M -0.024030962
6: b 1 2 M 0.009510128
7: b 2 1 M 0.003747244
8: b 2 2 M -0.002843307
For each ticker
and row_names
we have 2 x 2 = 4
combinations of par1
and par2
, namely, (1,1) (1,2) (2,1) (2,2)
.
I would like to calculate the mean
of perf
associated with ticker = a, par1 = 1, par2 = 1
with all the perf
associated with all other combinations for ticker = b
. Using numbers from the image above,
res
a_perf b_perf
1: 0.01146228 -0.024030962
2: 0.01146228 0.009510128
3: 0.01146228 0.003747244
4: 0.01146228 -0.002843307
apply(res,1,mean)
[1] -0.006284339 0.010486206 0.007604764 0.004309488
Then, we repeat this process for ticker = a, par1 = 1, par2 = 2
with all other combinations for ticker = b
.
We would repeat this process for all combinations of par1
and par2
with each row_names
.
EDIT::: Using @earch's suggestion we get the following:
tmp <- lapply(split(dt, dt$row_names), calcCombMeans)
$M
a.row b.row mean
1 1 2 -0.0022140524
2 3 2 -0.0032599264
3 5 2 0.0025657555
4 7 2 0.0033553619
5 1 4 0.0048441350
6 3 4 0.0037982609
7 5 4 0.0096239429
8 7 4 0.0104135493
9 1 6 -0.0072346110
10 3 6 -0.0082804850
11 5 6 -0.0024548031
12 7 6 -0.0016651967
13 1 8 0.0005593545
14 3 8 -0.0004865195
15 5 8 0.0053391624
16 7 8 0.0061287688
From here, I would like to pick the max(mean)
for row_names M,N,O,P,Q
. One way to do that would be this if I did not care about referencing indices later on:
res <- sapply(1:length(tmp),function(i) which.max(tmp[[i]]$perf))
[1] 8 6 3 12 16
This would be how I would calculate my desired end-result with completion:
res <- rbindlist(tmp,id="row_names")
res <- res[,list(best=max(perf),best_idx = which.max(perf)),by=row_names]
row_names best best_idx
1: M 0.010413549 8
2: N 0.009508122 6
3: O 0.009314068 3
4: P 0.008883106 12
5: Q 0.009316006 16
I haven't decided whether I need the best_idx
information (I probably will in order to replicate the exact calculation of a specific row_names
), but using this res
, I can calculate my cumRet
by doing:
res[,cumRet:= cumprod(best+1)-1]
> res
row_names best best_idx cumRet
1: M 0.010413549 8 0.01041355
2: N 0.009508122 6 0.02002068
3: O 0.009314068 3 0.02952123
4: P 0.008883106 12 0.03866657
5: Q 0.009316006 16 0.04834280
@earch's really helps being able to see the process of calculating all these combinations. I was wondering if there was a more efficient solution through using data.table
's functionality. My real data set is much larger than this (millions of rows), and the combinations will start to take a toll.
EDIT #2::: After being able to step through the process, I have figured out a very fast solution!
tmp <- dt[,list(par1=par1[which.max(perf)],par2=par2[which.max(perf)],perf=max(perf)),by=list(ticker,row_names)]
res <- tmp[,list(perf=mean(perf),par1= paste(par1,collapse=","),par2=paste(par2,collapse=",")),by=row_names]
Using data.table
allows me to calculate the max perf by group and ticker combinations. Then after doing that, I can group by row_names
. And it gets the same results!
> res
row_names perf par1 par2
1: M 0.010413549 2,2 2,1
2: N 0.009508122 2,2 1,1
3: O 0.009314068 1,1 2,1
4: P 0.008883106 2,1 2,2
5: Q 0.009316006 2,2 2,2
Upvotes: 0
Views: 498
Reputation: 1361
EDIT #2::: After being able to step through the process, I have figured out a very fast solution!
tmp <- dt[,list(par1=par1[which.max(perf)],par2=par2[which.max(perf)],
perf=max(perf)),
by=list(ticker,row_names)]
res <- tmp[,list(perf=mean(perf),par1= paste(par1,collapse=","),
par2=paste(par2,collapse=",")),by=row_names]
Using data.table
allows me to calculate the max perf by group and ticker combinations. Then after doing that, I can group by row_names
. And it gets the same results!
> res
row_names perf par1 par2
1: M 0.010413549 2,2 2,1
2: N 0.009508122 2,2 1,1
3: O 0.009314068 1,1 2,1
4: P 0.008883106 2,1 2,2
5: Q 0.009316006 2,2 2,2
Upvotes: 0
Reputation: 176
I'm not sure what values the cumulative product is being taken over, but here's a function that calculates the mean between all perf combinations of a and b within a row_names. It should give you what you need to finish the task:
calcCombMeans <- function(dt) {
a.rows <- which(dt$ticker == "a")
b.rows <- which(dt$ticker == "b")
rep.rows <- expand.grid(a.row = a.rows, b.row = b.rows)
rep.rows$mean <- sapply(1:nrow(rep.rows), function(i) {
mean(dt$perf[unlist(rep.rows[i, ])])
})
dt$means <- lapply(1:nrow(dt), function(i) {
if(dt$ticker[i] == "a") {
filter(rep.rows, a.row == i)$mean
} else {
filter(rep.rows, b.row == i)$mean
}
})
dt
}
do.call(rbind, lapply(split(dt, dt$row_names), calcCombMeans))
ticker par1 par2 row_names perf
1: a 1 1 M -0.0062645381
2: b 1 1 M 0.0018364332
3: a 2 1 M -0.0083562861
4: b 2 1 M 0.0159528080
5: a 1 2 M 0.0032950777
6: b 1 2 M -0.0082046838
7: a 2 2 M 0.0048742905
8: b 2 2 M 0.0073832471
9: a 1 1 N 0.0057578135
10: b 1 1 N -0.0030538839
11: a 2 1 N 0.0151178117
12: b 2 1 N 0.0038984324
13: a 1 2 N -0.0062124058
14: b 1 2 N -0.0221469989
15: a 2 2 N 0.0112493092
16: b 2 2 N -0.0004493361
17: a 1 1 O -0.0001619026
18: b 1 1 O 0.0094383621
19: a 2 1 O 0.0082122120
20: b 2 1 O 0.0059390132
21: a 1 2 O 0.0091897737
22: b 1 2 O 0.0078213630
23: a 2 2 O 0.0007456498
24: b 2 2 O -0.0198935170
25: a 1 1 P 0.0061982575
26: b 1 1 P -0.0005612874
27: a 2 1 P -0.0015579551
28: b 2 1 P -0.0147075238
29: a 1 2 P -0.0047815006
30: b 1 2 P 0.0041794156
31: a 2 2 P 0.0135867955
32: b 2 2 P -0.0010278773
33: a 1 1 Q 0.0038767161
34: b 1 1 Q -0.0005380504
35: a 2 1 Q -0.0137705956
36: b 2 1 Q -0.0041499456
37: a 1 2 Q -0.0039428995
38: b 1 2 Q -0.0005931340
39: a 2 2 Q 0.0110002537
40: b 2 2 Q 0.0076317575
ticker par1 par2 row_names perf
means
1: -0.0022140524, 0.0048441350,-0.0072346110, 0.0005593545
2: -0.002214052,-0.003259926, 0.002565755, 0.003355362
3: -0.0032599264, 0.0037982609,-0.0082804850,-0.0004865195
4: 0.004844135,0.003798261,0.009623943,0.010413549
5: 0.002565755, 0.009623943,-0.002454803, 0.005339162
6: -0.007234611,-0.008280485,-0.002454803,-0.001665197
7: 0.003355362, 0.010413549,-0.001665197, 0.006128769
8: 0.0005593545,-0.0004865195, 0.0053391624, 0.0061287688
9: 0.001351965, 0.004828123,-0.008194593, 0.002654239
10: 0.001351965, 0.006031964,-0.004633145, 0.004097713
11: 0.006031964, 0.009508122,-0.003514594, 0.007334238
12: 0.004828123, 0.009508122,-0.001156987, 0.007573871
13: -0.004633145,-0.001156987,-0.014179702,-0.003330871
14: -0.008194593,-0.003514594,-0.014179702,-0.005448845
15: 0.004097713, 0.007573871,-0.005448845, 0.005399987
16: 0.002654239, 0.007334238,-0.003330871, 0.005399987
17: 0.004638230, 0.002888555, 0.003829730,-0.010027710
18: 0.004638230,0.008825287,0.009314068,0.005092006
19: 0.008825287, 0.007075613, 0.008016787,-0.005840653
20: 0.002888555,0.007075613,0.007564393,0.003342332
21: 0.009314068, 0.007564393, 0.008505568,-0.005351872
22: 0.003829730,0.008016787,0.008505568,0.004283506
23: 0.005092006, 0.003342332, 0.004283506,-0.009573934
24: -0.010027710,-0.005840653,-0.005351872,-0.009573934
25: 0.002818485,-0.004254633, 0.005188837, 0.002585190
26: 0.002818485,-0.001059621,-0.002671394, 0.006512754
27: -0.001059621,-0.008132739, 0.001310730,-0.001292916
28: -0.0042546332,-0.0081327395,-0.0097445122,-0.0005603642
29: -0.0026713940,-0.0097445122,-0.0003010425,-0.0029046889
30: 0.0051888365, 0.0013107303,-0.0003010425, 0.0088831056
31: 0.0065127541,-0.0005603642, 0.0088831056, 0.0062794591
32: 0.002585190,-0.001292916,-0.002904689, 0.006279459
33: 0.0016693329,-0.0001366148, 0.0016417911, 0.0057542368
34: 0.001669333,-0.007154323,-0.002240475, 0.005231102
35: -0.007154323,-0.008960271,-0.007181865,-0.003069419
36: -0.0001366148,-0.0089602706,-0.0040464226, 0.0034251540
37: -0.002240475,-0.004046423,-0.002268017, 0.001844429
38: 0.001641791,-0.007181865,-0.002268017, 0.005203560
39: 0.005231102,0.003425154,0.005203560,0.009316006
40: 0.005754237,-0.003069419, 0.001844429, 0.009316006
means
Upvotes: 1