Gacek
Gacek

Reputation: 10322

PostgreSQL: Selecting N result in group by query

I have following SQL query that selects some results from my table:

select  avg(c3), count(c3), std  
from ssims where obraz = 'lena' group by std order by std

But I have various number of tests performed for different values of std, so it returns me something like that:

0.906176136363636;44;5
0.881669302325582;43;10
0.855873409090909;44;15
0.829195813953488;43;20
0.802071590909091;44;25
0.774523720930233;43;30
0.747213636363636;44;35
0.720115581395349;43;40
0.694712954545455;44;45
0.668683255813953;43;50

What I would like to do is to select average of constant (i.e. 20) number of results for every std value. So after such query, the second column would be 20 for each row.

How to do it? I tried limits and top, but without success

Upvotes: 2

Views: 1791

Answers (3)

Quassnoi
Quassnoi

Reputation: 425683

In PostgreSQL 8.3:

SELECT  a[1] AS avg_std, a[2] AS cnt_std, std
FROM    (
        SELECT  (
                SELECT  ARRAY[AVG(c3) , COUNT(*)]
                FROM    (
                        SELECT  c3
                        FROM    ssims si
                        WHERE   obraz = 'lena'
                                AND si.std = so.std
                        ORDER BY
                                id
                        LIMIT 20
                        ) q
                ) a
        FROM    (
                SELECT  DISTINCT std
                FROM    ssims
                WHERE   obraz = 'lena'
                ) so
        ) q

This will count both AVG and COUNT in a single index scan for each std.

Create a composite index on (obraz, std, id) for this to work fast.

In PostgreSQL 8.4:

SELECT  AVG(c3), COUNT(*), std
FROM    (
        SELECT  std, c3, ROW_NUMBER() OVER (PARTITION BY std ORDER BY id) AS rn
        FROM    ssims
        WHERE   obraz = 'lena'
        ) q
WHERE   rn <= 20
GROUP BY
        std

Upvotes: 6

Magnus Hagander
Magnus Hagander

Reputation: 25128

If you're on 8.4, you should be able to do that with a window function. (not sure what the std part is, but I'm sure you can add that back) Something like this (untested, so you may need to adjust some things):

SELECT std,avg(c3), count(c3)
FROM (
 SELECT std, c3, row_number() OVER (
  PARTITION BY std ORDER BY random())
 ) foo
WHERE row_number <= 20
GROUP BY std
ORDER BY std

If you don't care that you actually get a random subset, you can remove the ORDER BY random() part, and it'll give you a "almost-random" one.

Upvotes: 0

Tendayi Mawushe
Tendayi Mawushe

Reputation: 26138

Assuming your ssims table has a unique id column which I have called id in my example you could do the following:

select avg(c3), count(c3), std from ssims where id in 
   (select id from ssims where obraz = 'lena' LIMIT 20)
   group by std order by std;

Upvotes: 0

Related Questions