Reputation: 942
I have a problem with missing data. So I have a returns table created this way:
returns:update logret:log ret from update ret:{0.0, 1_deltas x} mid by sym from spots;
and looking like this:
meta returns
c | t f a
------| -----
date | d
time | p
sym | s
mid | f
ret | f
logret| f
and I pivot it this way:
rettbl:0!exec (raze `$raze ( string (distinct sym)),\:/:("_",/:string `mid`ret`logret))!(mid, ret, logret) by time:time from returns;
so that works on a small range of time, but when I extend it fails with a "type: Mismatched types"
I believe the problem is that I don't have the same observations for all sym in my returns
table, as
select count time by sym from returns
gives me a different count by sym.
So my question is: how would you go about filling any missing mid
for all times I have in my table returns
?
To make it quick and dirty i was thinking about having a linear interpolation for missing points between two `time ...
Any hint, link, etc... would be greatly appreciated.
EDIT:
I can't share the data, but this should allow to reproduce the issue:
mid1: 1.2 + ({rand 1.0} each til 10) %100
mid2: 0.8 + ({rand 1.0} each til 10) %100
mid3: 104 + ({rand 1.0} each til 10) %100
sym1:{`$"EUR/USD"} each til 10
sym2:{`$"GBP/USD"} each til 10
sym3:{`$"USD/JPY"} each til 10
time1:2020.07.01D00:00:00.000000000 + 1D00:00:00 * til 10
/time2:2020.07.01D00:00:00.000000000 + 1D00:00:00 * til 10
time2:(2020.07.01D00:00:00.000000000 + 1D00:00:00 * til 3) , (2020.07.06D00:00:00.000000000 + 1D00:00:00 * til 3) , (2020.07.10D00:00:00.000000000 + 1D00:00:00 * til 4)
time3:2020.07.01D00:00:00.000000000 + 1D00:00:00 * til 10
spots:([] sym:sym1,sym2,sym3; time:time1,time2,time3; mid:mid1,mid2,mid3)
spots:update date:"d"$time from spots
returns:update logret:log ret from update ret:{0.0, 1_deltas x} mid by sym from spots;
rettbl:0!exec (raze `$raze ( string (distinct sym)),\:/:("_",/:string `mid`ret`logret))!(mid, ret, logret) by time:time from returns;
do note the time2 definition.
as per the desired output, if you replace the time2 definition with the one I commented out you will see.
as per the solution, I suspect that adding to the table returns the missing time, so that the list of time is the same for all sym are identical will solve the issue, ie value select time by sym from returns
should be identical, and a quick a dirty work around would be to add missing mid with a linear interpolation of the surrounding mid.
Upvotes: 0
Views: 697
Reputation: 2800
The P#
is pretty crucial here in order to produce the pivot properly. This will place nulls where time is missing for a sym col. https://code.kx.com/q/kb/pivoting-tables/
{
P:exec (raze `$raze ( string (distinct sym)),\:/:("_",/:string `mid`ret`logret)) from returns;
exec P#(raze `$raze ( string (distinct sym)),\:/:("_",/:string `mid`ret`logret))!(mid, ret, logret) by time:time from returns
}[]
Output:
time | EUR/USD_mid GBP/USD_mid USD/JPY_mid EUR/USD_re..
-----------------------------| ----------------------------------------------..
2020.07.01D00:00:00.000000000| 1.203915 0.8078337 104.0062 0 ..
2020.07.02D00:00:00.000000000| 1.200812 0.8040996 104.0093 -0.0031030..
2020.07.03D00:00:00.000000000| 1.209368 0.8061088 104.0027 0.00855514..
2020.07.04D00:00:00.000000000| 1.202782 104.0006 -0.0065853..
2020.07.05D00:00:00.000000000| 1.202392 104.0026 -0.0003897..
2020.07.06D00:00:00.000000000| 1.201508 0.8049765 104.0023 -0.0008842..
2020.07.07D00:00:00.000000000| 1.201567 0.8040875 104.0009 5.91839e-0..
2020.07.08D00:00:00.000000000| 1.209785 0.8044973 104.001 0.00821768..
2020.07.09D00:00:00.000000000| 1.207043 104.0087 -0.0027416..
2020.07.10D00:00:00.000000000| 1.209442 0.8001392 104.0073 0.00239835..
2020.07.11D00:00:00.000000000| 0.8071488 ..
2020.07.12D00:00:00.000000000| 0.8019465 ..
You are getting a type error because the code is creating a dictionary when time is not present for all syms.
q)0!exec (raze `$raze ( string (distinct sym)),\:/:("_",/:string `mid`ret`logret))!(mid, ret, logret) by time:time from returns
'type
[0] 0!exec (raze `$raze ( string (distinct sym)),\:/:("_",/:string `mid`ret`logret))!(mid, ret, logret) by time:time from returns
^
q)exec (raze `$raze ( string (distinct sym)),\:/:("_",/:string `mid`ret`logret))!(mid, ret, logret) by time:time from returns
time | ..
-----------------------------| ----------------------------------------------..
2020.07.01D00:00:00.000000000| `EUR/USD_mid`GBP/USD_mid`USD/JPY_mid`EUR/USD_r..
2020.07.02D00:00:00.000000000| `EUR/USD_mid`GBP/USD_mid`USD/JPY_mid`EUR/USD_r..
2020.07.03D00:00:00.000000000| `EUR/USD_mid`GBP/USD_mid`USD/JPY_mid`EUR/USD_r..
2020.07.04D00:00:00.000000000| `EUR/USD_mid`USD/JPY_mid`EUR/USD_ret`USD/JPY_r..
2020.07.05D00:00:00.000000000| `EUR/USD_mid`USD/JPY_mid`EUR/USD_ret`USD/JPY_r..
Edit: To remove nulls you could use fills
to fill forward non-null values. You also may want to use .Q.id
to sanitize the table first. Columns such as EUR/USD_mid
are tricky to work with due to the confusion with over /
.
https://code.kx.com/q/ref/dotq/#qid-sanitize
q)fills x
time | EURUSD_mid GBPUSD_mid USDJPY_mid EURUSD_ret ..
-----------------------------| ----------------------------------------------..
2020.07.01D00:00:00.000000000| 1.203928 0.8049318 104.0047 0 ..
2020.07.02D00:00:00.000000000| 1.205171 0.8057852 104.0063 0.001243387 ..
2020.07.03D00:00:00.000000000| 1.20516 0.8008389 104.0097 -1.11147e-005..
2020.07.04D00:00:00.000000000| 1.204067 0.8008389 104.0023 -0.001093155 ..
2020.07.05D00:00:00.000000000| 1.201781 0.8008389 104.0095 -0.002285803 ..
This will remove most nulls except for any that exist in the first row. I don't have any experience writing something such as linear interpolation in kdb and I don't think it will replace nulls as well as fills. For example, the example data set has 3 nulls at the end in the EURUSD_mid column so can't be interpolated for those 3 entries.
Upvotes: 3