Reputation: 8713
I want to process all the rows of a kdb table in an R program (I use qserver.R). One way to do this is to initialize a memory handler and then iterate through all the rows one of the time, as explained here:
t: select from mytable where ts>12:30:00,ts<15:00:00,price,msg="A"
t[0]
t[1]
t[2]
...
I want to limit the number of client/server calls in R to loop as fast as possible. How can I fetch multiple rows for each call?
Upvotes: 2
Views: 1609
Reputation: 1919
NOTE: my answer below assumes that mytable
is the partioned database, but that you now have t
in memory.
another option using cut
(using "chunks" of 1,000,000 as per your earlier post)
(`int$1e6) cut t
now you have a list of table "chunks" of your desired size and you can use accordingly.
I frequently use this for certain functions (particularly in combination with peach
).
A pattern I've found useful is:
f:`function that does something useful on chunks`
fa:`function that reaggregates up to final results`
r:fa raze f peach (`int$`size`)cut t
if you're t
is really large (both vertical/horizontal) you might want to avoid cut
directly on the table for memory reasons, but can instead cut
a list of indices for the table into the appropriate size and then feed the indices to your f
and have that index to the t
and grab what you want.
Below a quick comparison of both approaches (note that f
here is pointless, but just to prove the point of the cut
on t
versus indices)
q)t:flip (`$"c",/:string til 100)!{(`int$1e7)?100} each til 100
q)\ts a:raze {select c1,c99 from x}each 1000 cut t
3827 4108103072j
q)\ts b:raze {select c1,c99 from t[x]}each 1000 cut til count t
3057 217623200j
q)4108103072j%217623200j
18.87714
q)a~b
1b
Upvotes: 1
Reputation: 2605
From your previous questions I assume this is a 1 person system so what benefit are you getting from kdb? Why not work fully in R and just use flat memory mapped files directly there? Avoiding unneeded complexity and overhead. If all you want to do is stream the data through R in order that should be simple.
Rather than "ts>12:30:00,ts<15:00:00" use "ts within (12:30:00;15:00:00)" it's quicker.
The larger the size of chunks you process in the more efficient it is likely to be. 100 seems quite small.
Regards, Ryan Hamilton
Upvotes: 1
Reputation: 8713
Sorted out, this returns 100 rows each time:
\l /data/mydb
t: select from mytable where ts>12:30:00,ts<15:00:00,price,msg="A"
select [0 100] from t
select [100 100] from t
select [200 100] from t
..
Upvotes: 0