marrowgari
marrowgari

Reputation: 427

Kdb database to NumPy array in PyQ

I have a splayed Kdb database of symbols, floats, and timestamps. I'd like to convert this to NumPy arrays. However using the following code...

>>> import numpy as np
>>> from pyq import q
>>> d = q.load(':alpha/HDB/')
>>> a = np.array(d)

Returns this error...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/marrowgari/py3/lib/python3.6/site-packages/pyq/_n.py", line 158, in array
return numpy.array(list(self), dtype)
TypeError: iteration over a K scalar, t=-11

Is this because Kdb symbol types do not have a direct analogue in NumPy? If so, how do I correct this?

Upvotes: 2

Views: 897

Answers (2)

Alexander Belopolsky
Alexander Belopolsky

Reputation: 2268

Suppose your HDB was created as follows:

q)(` sv db,`t`)set .Q.en[db:`:alpha/HDB]([]sym:`A`B`C;a:1 2 3)
`:alpha/HDB/t/
q)\l alpha/HDB
q)t
sym a
-----
A   1
B   2
C   3

Then, first of all you should load it using \l command, not the load function:

>>> q('\\l alpha/HDB')
k('::')

This will load all your tables and enumeration domains.

Now you should be able to convert the sym column of your table to a numpy array of strings

>>> np.array(q.t.sym)
array(['A', 'B', 'C'], dtype=object)

or to a numpy array of integers:

>>> np.array(q.t.sym.data)
array([0, 1, 2])

You can also convert the entire table to a numpy record array in one go, but you will have to "map" it into the memory first:

>>> np.array(q.t.select())
array([('A', 1), ('B', 2), ('C', 3)], dtype=[('sym', 'O'), ('a', '<i8')])

Upvotes: 3

Jonathon McMurray
Jonathon McMurray

Reputation: 2981

I don't think .q.load does what you're expecting - the return of this function is simply a null symbol. I think instead you need to use .q.get e.g.

jmcmurray@host ~/hdb $ pyq
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> q.load("sym")
k('`sym')
>>> np.array(q.get(":2014.04.21/trades").select())
array([('AAPL', '2014-04-21T08:00:37.853000000', 'O',  25.33, 5048),
       ('AAPL', '2014-04-21T08:00:58.840000000', 'O',  25.35, 4580),
       ('AAPL', '2014-04-21T08:01:40.150000000', 'O',  25.35, 5432), ...,
       ('YHOO', '2014-04-21T16:29:06.868000000', 'L',  35.32, 4825),
       ('YHOO', '2014-04-21T16:29:43.655000000', 'L',  35.32, 6125),
       ('YHOO', '2014-04-21T16:29:57.229000000', 'L',  35.36,   41)],
      dtype=[('sym', 'O'), ('time', '<M8[ns]'), ('src', 'O'), ('price', '<f8'), ('size', '<i4')])
>>>

Note here I first use .q.load to load the sym file, as the symbol columns are enumerated. Then I load one splayed table from my HDB, which should be equivalent to your splayed table.

I also use .select() on the table as .q.get() simply maps the table into memory (same as get in KDB), it's necessary to use select to pull the actual data into memory.

Upvotes: 2

Related Questions