SkyWalker
SkyWalker

Reputation: 14317

How to optimize a batch pivotization?

I have a datetime list (which for some reason I call it column date) containing over 1k datetime.

adates:2017.10.20T00:02:35.650 2017.10.20T01:57:13.454 ...

For each of these dates I need to select the data from some table, then pivotize by a column t i.e. expiry, add the corresponding date datetime as column to the pivotized table and stitch together the pivotization for all the dates. Note that I should be able to identify which pivotization corresponds to a date and that's why I do it one by one:

fPivot:{[adate;accypair]
     t1:select from volatilitysurface_smile where date=adate,ccypair=accypair;
     mycols:`atm`s10c`s10p`s25c`s25p;
     t2:`t xkey 0!exec mycols#(stype!mid) by t:t from t1;
     t3:`t xkey select distinct t,tenor,xi,volofvol,delta_type,spread from t1;
     result:ej[`t;t2;t3];
     :result}

I then call this function for every datetime adates as follows:

raze {[accypair;adate] `date xcols update date:adate from fPivot[adate;accypair] }[`EURCHF] @/: adates;

this takes about 90s. I wonder if there is a better way e.g. do a big pivotization rather than running one pivotization per date and then stitching it all together. The big issue I see is that I have no apparent way to include the date attribute as part of the pivotization and the date can not be lost otherwise I can't reconciliate the results.

Upvotes: 1

Views: 182

Answers (2)

SkyWalker
SkyWalker

Reputation: 14317

OK I solved the issue by creating a batch version of the pivotization that keeps the date (datetime) table field when doing the group by bit needed to pivot i.e. by t:t from ... to by date:date,t:t from .... It went from 90s down to 150 milliseconds.

fBatchPivot:{[adates;accypair]
    t1:select from volatilitysurface_smile where date in adates,ccypair=accypair;
    mycols:`atm`s10c`s10p`s25c`s25p;
    t2:`date`t xkey 0!exec mycols#(stype!mid) by date:date,t:t from t1;
    t3:`date`t xkey select distinct date,t,tenor,xi,volofvol,delta_type,spread from t1;
    result:0!(`date`t xasc t2 ij t3);
    :result}

Upvotes: 1

Thomas Smyth
Thomas Smyth

Reputation: 5644

If you havent been to the wiki page on pivoting then it may be a good start. There is a section on a general pivoting function that makes some claims to being somewhat efficient:

One user reports:

This is able to pivot a whole day of real quote data, about 25 million quotes over about 4000 syms and an average of 5 levels per sym, in a little over four minutes.

As for general comments, I would say that the ej is unnecessary as it is a more general version of ij, allowing you to specify the key column. As both t2 and t3 have the same keying I would instead use:

t2 ij t3

Which may give you a very minor performance boost.

Upvotes: 1

Related Questions