Utsav
Utsav

Reputation: 5918

Difference in partitioned and non-partitioned table in terms of vertical join in q kdb

I have two non-partitioned tables:

q)s:([] date:(2019.07.01;2019.07.01;2019.07.02;2019.07.01;2019.07.05); co:`a`b`f`b`c)
q)t:([] date:(2019.07.01;2019.07.01;2019.07.02;2019.07.01;2019.07.07); co:`a`b`e`b`d)

In above table when I run below query it works perfectly fine.

q)select distinct co from s,t where date within 2019.07.01 2019.07.02
co
--
a
b
f
e

I have tables with same name which are partitioned by date, when I try to run same query on partitioned tables I get below error:

ERROR: 'par 
(trying to update a physically partitioned table)
  1. Why do we get above error in partitioned tables?
  2. What is the optimized approach to get similar output as we got in non-partitioned tables?

One solution to for 2 which I feel as brute-force is:

select distinct co from((select distinct co from s where date within 2019.07.01 2019.07.02),select distinct co from t where date within 2019.07.01 2019.07.02)

Upvotes: 1

Views: 323

Answers (1)

Callum Biggs
Callum Biggs

Reputation: 1540

I'm assuming you are only including the date name in the source tables to assist in queries. A date partitioned table will generate the virtual date column from the hdb structure, you shouldn't include it in the actual table being written to.

  1. Why do we get above error in partitioned tables?

There is no way to avoid having to access the data of a partitioned table except through an initial a select statement.. In this case you are directly trying to perform a , operation to the s and t tables

  1. What is the optimized approach to get similar output as we got in non-partitioned tables?

In general, there may be a trade-off between the table size and the nature and frequency of the operations, sometimes it may be worth bringing the table into memory for frequent joins, or creating a top-level flat table with the relevant subset of data.

If this is just a generalized test case for larger operations then something along the following would be ideal

distinct raze {select distinct co from x where date within 2019.07.01 2019.07.02} each `s`t

This performance is not very different from your own query however, it's just a bit more succinct.

Upvotes: 1

Related Questions