Utsav
Utsav

Reputation: 5918

Why first column in kdb+tick cannot be of String type

I tried setting the first column in the schema as String type but the data was not being updated in tickerplant.

Then I changed the first column in the schema which was of Symbol type and it worked perfectly.

While trying to debug I came across an article which says that it is always good to place first column of type timespan or symbol in the table.

  1. Why String type column cannot be placed at the start of the table (Since String is list of characters(complex type) or is there any other reason)?
  2. Why is it good/requirement to place symbol/timespan column at the start of each table?

Upvotes: 2

Views: 359

Answers (2)

Callum Biggs
Callum Biggs

Reputation: 1540

The types used in your schemas should play nicely with the function that is called on by the feedhandler in the TP. I've broken down the default .u.upd for the TP when a timer is specified.

// All the following is defined in the .u context
upd:{[t;x]
    // t - symbol
    // x - list of lists
    // Check if the first type is a timespan. First list should be a list of timespan
        // Vanilla TP will add timespans to the data
    if[not -16=type first first x;
        // If the current time is greater than the gloabl .u.d then run the timer, causing an EOD
        if[d<"d"$a:.z.P;
            .z.ts[]];
        // Get the timespan
        a:"n"$a;
        // Append the timespan on, handling differently if only a list of atoms
        x:$[0>type first x;
            a,x;
            (enlist(count first x)#a),x]
        ];
    // Insert the data
    t insert x;
    // if the handle .u.l is defined, add to the logfile and increment the total count .u.j
    if[l;
        l enlist (`upd;t;x);j+:1];
    }

From this you can see that by default, the TP will assume that the first column is expected to be temporal, and if it isn't then a timespan is added. This is useful if you are already sending in a timestamp, as you can then extract the time it took from the feed to be handled by the TP.

By sending a string in first, the TP would be adding an additional column into your data. This would either result in a mismatch (if you were already supplying a timespan) or a type error (by inserting timespans into a string column, and vice versa).

Changing all of this around would be trivial. For example, to check the second column for a timespan instead of the first, it would just be first first 1_ x. Perhaps you would always want to check for EOD (i.e., if you actually send the data with a timespan as the first column, in which case you would place the .z.ts[] call outside the first if statement.

To summarise

  1. The column order is dependent on the update function you are using. The addition of a timespan allows for better timing of the throughput of your data from the feed through to ingestion. Depending on what you are trying to do you might want to add in a timestamp instead, or simply use the temporal values you add in your feedhandler.
  2. The use of strings and symbols is detailed here. I would like to say that Ferenc is partially incorrect, the use of a column called sym is strongly encouraged to conform with standards but you can update your code to use a different column name relatively easily (you would need to adjust tick.q to not check for columns time and sym). It may make including work by others more difficult though.

Upvotes: 2

Ferenc Bodon
Ferenc Bodon

Reputation: 452

All tables handled by the ticker plant must have column sym as symbol. Subscribers must provide the table and can provide an optional set of sym values they subscribe to.

Theoretically, sym column could be string as well but for performance reason symbol is better. Looking up a symbol is faster than looking up a string as symbol comparison is simply an integer comparison behind the scene.

Upvotes: 1

Related Questions