Reputation: 3063
Is it possible to get the average of the top X% items in a group?
For example:
I have a table which has a item_id, timestamp and price column. The output should be grouped by item_id and timestamp and the 'price-column' should get averaged. For the averaging only the lowest X% prices within that group should be used.
I've found similar questions (How to select top x records for every group) but this won't work with sqlite.
Upvotes: 3
Views: 2223
Reputation: 1270411
Getting the top n records within each group requires counting. Assuming that there are no duplicates, the following query returns the number of records for an item:
select t.*,
(select count(*) from t t2 where t2.item_id = t.item_id
) as NumPrices
from t
This is called a correlated subquery. Now, let's extend the idea to include a rank and then calculate the average for the right group:
select item_id, avg(price)
from (select t.*,
(select count(*) from t t2 where t2.item_id = t.item_id
) as NumPrices,
(select count(*) from t t2 where t2.item_id = t.item_id and t2.price <= t.price
) as PriceRank
from t
) t
where (100.0*PriceRank / NumPrices) <= X
group by item_id
To improve performance, you will want an index on (item_id, price)
.
Upvotes: 5
Reputation: 180172
To get the count of records in the group with ID I
and timestamp T
, use this query:
SELECT COUNT(*)
FROM MyTable
WHERE item_id = I
AND timestamp = T
To get the limit, multiply with X
, and use ROUND
/CAST
to convert to an integer:
SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable
WHERE item_id = I
AND timestamp = T
To get all records in a specific group that are inside that limit, order the records in the group by price, and limit the returned count:
SELECT *
FROM MyTable
WHERE item_id = I
AND timestamp = T
ORDER BY price
LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable
WHERE item_id = I
AND timestamp = T)
In theory, to get the group averages, add GROUP BY
around that:
SELECT item_id,
timestamp,
(SELECT AVG(price)
FROM (SELECT price
FROM MyTable T2
WHERE T2.item_id = T1.item_id
AND T2.timestamp = T1.timestamp
ORDER BY price
LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable T3
WHERE T3.item_id = T1.item_id
AND T3.timestamp = T1.timestamp)
)
) AS AvgPriceLowestX
FROM MyTable T1
GROUP BY item_id,
timestamp
However, it appears that SQLite does not allow access to correlation variables from inside the LIMIT
clause, so this does not work in practice.
You would have to get the IDs of all groups (SELECT DISTINCT item_id, timestamp FROM MyTable
) and execute the third query above for each group.
In any case, ensure that you have one index on the three columns item_id
, timestamp
, and price
to get good performance.
Upvotes: 1