Reputation: 14317
I have a datetime list (which for some reason I call it column date
) containing over 1k datetime.
adates:2017.10.20T00:02:35.650 2017.10.20T01:57:13.454 ...
For each of these dates I need to select the data from some table, then pivotize by a column t
i.e. expiry, add the corresponding date
datetime as column to the pivotized table and stitch together the pivotization for all the dates. Note that I should be able to identify which pivotization corresponds to a date and that's why I do it one by one:
fPivot:{[adate;accypair]
t1:select from volatilitysurface_smile where date=adate,ccypair=accypair;
mycols:`atm`s10c`s10p`s25c`s25p;
t2:`t xkey 0!exec mycols#(stype!mid) by t:t from t1;
t3:`t xkey select distinct t,tenor,xi,volofvol,delta_type,spread from t1;
result:ej[`t;t2;t3];
:result}
I then call this function for every datetime adates
as follows:
raze {[accypair;adate] `date xcols update date:adate from fPivot[adate;accypair] }[`EURCHF] @/: adates;
this takes about 90s. I wonder if there is a better way e.g. do a big pivotization rather than running one pivotization per date and then stitching it all together. The big issue I see is that I have no apparent way to include the date
attribute as part of the pivotization and the date
can not be lost otherwise I can't reconciliate the results.
Upvotes: 1
Views: 182
Reputation: 14317
OK I solved the issue by creating a batch version of the pivotization that keeps the date (datetime) table field when doing the group by bit needed to pivot i.e. by t:t from ...
to by date:date,t:t from ...
. It went from 90s down to 150 milliseconds.
fBatchPivot:{[adates;accypair]
t1:select from volatilitysurface_smile where date in adates,ccypair=accypair;
mycols:`atm`s10c`s10p`s25c`s25p;
t2:`date`t xkey 0!exec mycols#(stype!mid) by date:date,t:t from t1;
t3:`date`t xkey select distinct date,t,tenor,xi,volofvol,delta_type,spread from t1;
result:0!(`date`t xasc t2 ij t3);
:result}
Upvotes: 1
Reputation: 5644
If you havent been to the wiki page on pivoting then it may be a good start. There is a section on a general pivoting function that makes some claims to being somewhat efficient:
One user reports:
This is able to pivot a whole day of real quote data, about 25 million quotes over about 4000 syms and an average of 5 levels per sym, in a little over four minutes.
As for general comments, I would say that the ej
is unnecessary as it is a more general version of ij
, allowing you to specify the key column. As both t2
and t3
have the same keying I would instead use:
t2 ij t3
Which may give you a very minor performance boost.
Upvotes: 1