Update a serialized table on disk in kdb

Question

I have a serialised table on disk which I want to update based on condition. One way of doing so, is by loading the table in memory, updating it and then serializing it again on disk. Eg:

q)`:file set ([] id:10 20; l1:("Blue hor";"Antop"); l2:("Malad"; "KC"); pcd:("NCD";"FRB") );
q)t:get `:file;
q)update l1:enlist "Chin", l2:enlist "Gor" from `t where id=10;
q)`:file set t;

I tried updating the table directly on disk but received type error:

q)update l1:enlist "Chin", l2:enlist "Gor" from `:file where id=10
'type
  [0]  update l1:enlist "Chin", l2:enlist "Gor" from `:file where id=10

Is there a way in which we can update the serialized table directly on disk?(In one of the case we don't have enough primary memory to load the serialized table)

Adam Bonham · Accepted Answer

If you save your table as one flat file, then the whole table has to be loaded in, updated and then written down, requiring enough memory to hold the full table. To avoid this you can splay your table by adding a trailing / in your filepath, ie

`:file/ set ([] id:10 20; l1:("Blue hor";"Antop"); l2:("Malad"; "KC"); pcd:("NCD";"FRB") );

If symbols columns are present they will need to be enumerated using .Q.en.

This will split the table vertically and save your columns as individual files, under a directory called file. Having the columns saved as individual files allows you to only load in the columns required as opposed to the entire table, resulting in smaller memory requirements. You only need to specify the required columns in a select query.

https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#142-splayed-tables

You can further split your data horizontally by partitioning, resulting in smaller subsets again if individual columns are too big.

https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#143-partitioned-tables

When you run

get`:splayedTable

This memory maps the table assuming your columns are mappable, as opposed to copying it into memory, shown by .Q.w[]

You could do

tab:update l1:enlist "Chin", l2:enlist "Gor" from (select id, l1,l2 from get`:file) where id=10
`:file/l1 set tab`l1
`:file/l2 set tab`l2

If loading only the required columns for your query into memory is still too big, you can load them one at a time. First load id and find the required indices (where id=10), delete id from memory, load in l1 and modify using the indices,

@[get`:file/l1;indicies;:;enlist"Chin"]

write it down and then delete it from memory. Then do the same with l2. This way you would have at most one column in memory. Ideally your table would be appropriately partitioned so you can hold the data in memory.

You can also directly update vectors on disk, which avoids having to re write the whole file, for example,

ind:exec i from get`:file where id=10

@[`:file/l1;ind;:;enlist"Chin"]

Though there are some restrictions on the file which are mentioned in the below link https://code.kx.com/q/ref/amend/#functional-amend

Update a serialized table on disk in kdb

Answers (1)

Related Questions