Tick data in kdb partitioned-by-date table: should I use a time or datetime column?

Question

What is the standard canonical temporal column for a tick-level dataset (trades or quotes) in q/kdb+ when using a table partitioned by date? Should I have a time or a datetime column in addition to the partition? I'm seeing the following pros and cons, but maybe I'm missing something:

Timestamp columns: Con: I'm storing redundant data, both on disk and when displayed; so a select from table where date=max date effectively shows me the date twice; but Pro: it's much easier to interface with Python and I can more easily re-partition the data if I wanted to

Time columns: Con: Messier to interface with Python; Pro: I don't see the date twice when doing a select

I am aware of the sizes in bytes of the individual datatypes, so I understand I may not be saving space if for example I would be replacing a timespan (`n) with a timestamp (`p), but I can't for the life of me ever remember doing a select (I used to work in a bank with q as well) and ever seeing the date twice!

terrylynch · Accepted Answer

One of the most important considerations for the time column in on-disk tick data is whether or not it can be used in an as-of join aj. For this you want the time column to be sorted within each ticker/sym (assuming you have p# on ticker/sym column, which is recommended for tick data), and so long as this time column has an underlying long/integer type (which both timestamp and timespan do, datetime doesn't) then an aj will be optimal.

So the fact that you "see" the date again isn't a big concern. One other thing to be aware of: if you do go with timestamps, it is possible that your timestamps can span more than one date within a given date slice? i.e. could you have records from the day before? If so then something like this:

select from table where date=2022.02.01, time>23:00

could theoretically return the records which are >23:00 from both dates.

Tick data in kdb partitioned-by-date table: should I use a time or datetime column?

Answers (2)

Related Questions