Reputation: 47
I have a problem for creating a query for postgres(strictly speaking its redshift).
table data is below.
the table is PARTITION BY user_id ORDER BY created_at desc
user_id| x | y | min | created_at
-------+---+---+------+---------------------
1| 1 | 1 | 1 | 2015-01-15 17:26:53
1| 1 | 1 | 2 | 2015-01-15 17:26:54
1| 1 | 1 | 3 | 2015-01-15 17:26:55
1| 2 | 1 | 10 | 2015-01-16 02:46:21
1| 1 | 1 | 15 | 2015-01-16 02:46:22
1| 3 | 3 | 11 | 2015-01-16 03:01:44
1| 3 | 3 | 2 | 2015-01-16 03:02:06
2| 1 | 1 | 3 | 2015-01-16 03:02:12
2| 2 | 1 | 4 | 2015-01-16 03:02:15
2| 2 | 1 | 7 | 2015-01-16 03:02:18
and what I want is below
user_id| x | y | sum_min |
-------+---+---+----------+
1| 1 | 1 | 6 |
1| 2 | 1 | 10 |
1| 1 | 1 | 15 |
1| 3 | 3 | 13 |
2| 1 | 1 | 3 |
2| 2 | 1 | 11 |
If I use simply group by user_id, x, y, the result of will be
user_id| x | y | sum_min |
-------+---+---+----------+
1| 1 | 1 | 21 |
:| : | : | : |
this is not good for me:(
Upvotes: 0
Views: 65
Reputation: 32264
It seems that what you want to do is to calculate an aggregate function over a cluster of records ordered on a column that is based on same values in three columns, separated from other clusters only by those three column values. That is not possible in standard SQL because the order of records is not relevant to any of the SQL commands. The fact that you order by date does not change that: SQL commands simply do not support this kind of stratification.
The only option that I am aware of is to create a plpgsql
function with a cursor
on your data
relation (presumably a view, but would work equally well with a table). You iterate over all the records in the relation and for each cluster encountered sum up the min
values and output a new record with the clustering columns and the summed value.
CREATE FUNCTION sum_clusters()
RETURNS TABLE (user_id int, x int, y int, sum_int int) AS $$
DECLARE
data_row data%ROWTYPE;
cur CURSOR FOR SELECT * FROM data;
cur_user integer;
cur_x integer;
cur_y integer;
sum integer;
BEGIN
OPEN cur;
FETCH NEXT cur INTO data_row;
LOOP
IF NOT FOUND THEN
EXIT;
END IF;
cur_user := data_row.user_id;
cur_x := data_row.x;
cur_y := data_row.y;
sum := data_row.min;
LOOP
FETCH NEXT cur INTO data_row;
IF NOT FOUND THEN
EXIT;
END IF;
IF (data_row.user_id = cur_user) AND (data_row.x = cur_x) AND (data_row.y = cur_y) THEN
sum += data_row.min;
ELSE
EXIT;
END IF;
END LOOP;
RETURN NEXT cur_user, cur_x, cur_y, sum;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
That is a lot of code and not particularly fast, but it should work.
Upvotes: 0
Reputation: 21915
try this
with cte as (
select user_id,x,y,created_at,sum(min) over (partition by user_id,x,y,replace order by user_id ) sum_min from (
select user_id,x,y,min,replace( created_at::date::text ,'-',''),created_at from usr order by created_at
)t order by created_at
)
select user_id,x,y,sum_min from cte
group by sum_min,user_id,x,y
order by user_id
Upvotes: 1
Reputation: 768
Maybe try grouping it by the creation date as well:
select user_id, x, y, sum(min), created_at::date from test
group by user_id, x, y, created_at::date
order by user_id, x, y, created_at
Upvotes: 0