Reputation: 1383
That's my current query, it works but it is slow:
SELECT row, MIN(flg) ||' to ' ||Max (flg) as xyz , avg(amt_won), count(*)
FROM(
SELECT (ROW_NUMBER() OVER (ORDER BY flg))*100/
(SELECT count(*)+100 as temprow FROM temporary_six_max) as row, flg, amt_won
FROM temporary_six_max
JOIN (
SELECT id_player AS pid, avg(flg_vpip::int) AS flg
FROM temporary_six_max
GROUP BY id_player
) AS auxtable
ON pid = id_player
) as auxtable2
group by 1
order by 1;
I am grouping in fixed (or almost fixed) count 100 ranges that are ordered by avg(flg_vpip) grouped by id_player.
Here I've pasted the results in case it may help to understand: https://spreadsheets0.google.com/ccc?key=tFVsxkWVn4fMWYBxxGYokwQ&authkey=CNDvuOcG&authkey=CNDvuOcG#gid=0
I wonder if there is a better function to use than ROW_NUMBER() in this case and I feel like I am doing too many subselects but I don't know how to optimize it.
I'll appreciate very much any help.
If something is not clear just let me know.
Thank you.
EDIT:
The reason I created auxtable 2, is because when I use (ROW_NUMBER() OVER (ORDER BY flg), and use other agregate commands such as avg(amt_won) and count(*), which are essential, I get an error saying that flg should be in the aggregate function, but I can't order by an aggregate function of flg.
Upvotes: 0
Views: 288
Reputation: 133802
I generated some data to test with like this:
create table temporary_six_max as
select id_player, flg_vpip,
random()*100 * (case flg_vpip when 0 then 1 else -1 end) as amt_won
from (select (random()*1000)::int as id_player, random()::int as flg_vpip
from generate_series(1,1000000)) source;
create index on temporary_six_max(id_player);
Your query runs successfully against that, but doesn't quite generate the same plan, I get a nested loop in the lower arm rather than a merge and a seq scan in the init-plan-- you haven't turned off enable_seqscan I hope?
A solution just using a single scan of the table:
select row, min(flg) || ' to ' || max(flg) as xyz, avg(amt_won), count(*)
from (select flg, amt_won, ntile(100) over(order by flg) as row
from (select id_player as pid, amt_won,
avg(flg_vpip::int) over (partition by id_player) as flg
from temporary_six_max
) player_stats
) chunks
group by 1
order by 1
The bad news is that this actually performs worse on my machine, especially if I bump work_mem up enough to avoid the first disk sort (making player_stats, sorting by flg). Although increasing work_mem did halve the query time, so I guess that is at least a start?
Having said that, my queries are running for about 5 seconds to process 10E6 input rows in temporary_six_max, which is an order of magnitude faster than you posted. Does your table fit into your buffer cache? If not, a single-scan solution may be much better for you. (Which version of Postgresql are you using? "explain (analyze on, buffers on) select..." will show you buffer hit/miss rates in 9.0, or just look at your "shared_buffers" setting and compare with the table size)
Upvotes: 2