David Moss
David Moss

Reputation: 21

in KDB q, how do you write a custom mavg that drops the top 5 and bottom 5 results before doing moving average

I have a 5min table that has a score value like the table below. How can I add a new column which is the 10-period moving average which drops the highest 2 and lowest 2 scores first, and just average the 6 remaining values?

q)select from scores_table

|datetime                       | score |
|------------------------------ | ----- |
|2022.10.28D22:20:00.000000000  | 223   |
|2022.10.28D22:25:00.000000000  | 221   |
|2022.10.28D22:30:00.000000000  | 231   |
|2022.10.28D22:35:00.000000000  | 182   |
|2022.10.28D22:40:00.000000000  | 204   |
|2022.10.28D22:45:00.000000000  | 143   |
|2022.10.28D22:50:00.000000000  | 228   |
|2022.10.28D22:55:00.000000000  | 203   |
|2022.10.28D23:00:00.000000000  | 158   |
|2022.10.28D23:05:00.000000000  | 98    |
|2022.10.28D23:10:00.000000000  | 111   |
|2022.10.28D23:15:00.000000000  | 146   |
|2022.10.28D23:20:00.000000000  | 94    |
|2022.10.28D23:25:00.000000000  | 159   |
|2022.10.28D23:30:00.000000000  | 93    |
|2022.10.28D23:35:00.000000000  | 91    |
|2022.10.28D23:40:00.000000000  | 132   |
|2022.10.28D23:45:00.000000000  | 77    |
|2022.10.28D23:50:00.000000000  | 179   |
|2022.10.28D23:55:00.000000000  | 140   |

In other words, I want to do something like "update mavg10:10 mavg score from scores_table" but before doing the average of the past 10 items, drop the highest 2 and lowest 2 scores, then do average on the rest. And then have the result in a new column.

I also read: https://code.kx.com/q/kb/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window and I tried modifying the swin2 code snippet there, but I couldn't get it to work :(

Any advice would be appreciated, thanks.

Upvotes: 1

Views: 36

Answers (1)

If you look at swin from the link you posted, you can see that it uses the inbuild avg function. So all you have to do, is to write a function, that takes the elements at index 2 to 7 of a sorted 10 element list (index 0 to 7, discharging index 0,1 and 8,9) and calculate the avg of those.

q)swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
q)swin[avg; 3; til 10]

let's build that function

x where (iasc iasc 0N!x:-10?10) within 2 7
8 0 1 4 9 6 7 3 5 2
4 6 7 3 5 2

as you can see from above, applying iasc twice gives you the indexes of the list (sorted in ascending order), we then ignore index 0,1 and 8,9. now apply the average

f:{avg x where (iasc iasc x) within 2 7}

we can now use the function f together with swin

t:([] datetime:.z.D+20?.z.t; score:20?1000)
t
datetime                      score
-----------------------------------
2025.03.02D14:34:54.963000000 922
2025.03.02D08:24:53.628000000 360
2025.03.02D09:26:16.172000000 605
2025.03.02D01:38:45.609000000 550
2025.03.02D11:01:05.002000000 489
2025.03.02D10:40:23.165000000 496
2025.03.02D15:01:41.486000000 989
2025.03.02D08:30:06.037000000 704
2025.03.02D03:28:55.129000000 416
2025.03.02D09:57:28.855000000 366
2025.03.02D09:28:13.942000000 347
2025.03.02D11:03:58.982000000 270
2025.03.02D14:37:23.380000000 221
2025.03.02D00:06:59.623000000 396
2025.03.02D13:02:02.104000000 206
2025.03.02D15:02:05.012000000 169
2025.03.02D04:35:02.468000000 437
2025.03.02D05:00:51.910000000 450
2025.03.02D09:37:08.266000000 187
2025.03.02D12:09:01.385000000 798

update swin[f;10;score] from t
datetime                      score
--------------------------------------
2025.03.02D14:34:54.963000000 0
2025.03.02D08:24:53.628000000 0
2025.03.02D09:26:16.172000000 60
2025.03.02D01:38:45.609000000 151.6667
2025.03.02D11:01:05.002000000 233.1667
2025.03.02D10:40:23.165000000 315.8333
2025.03.02D15:01:41.486000000 416.6667
2025.03.02D08:30:06.037000000 534
2025.03.02D03:28:55.129000000 543.3333
2025.03.02D09:57:28.855000000 543.3333
2025.03.02D09:28:13.942000000 487
2025.03.02D11:03:58.982000000 487
2025.03.02D14:37:23.380000000 444
2025.03.02D00:06:59.623000000 418.3333
2025.03.02D13:02:02.104000000 381.8333
2025.03.02D15:02:05.012000000 336
2025.03.02D04:35:02.468000000 336
2025.03.02D05:00:51.910000000 336
2025.03.02D09:37:08.266000000 301
2025.03.02D12:09:01.385000000 312.8333

Edit: Note, index within 2 7 is hardcoded but that can easily be modified

Upvotes: 1

Related Questions