Advanced group by

Question

I have a table with 5 columns:

property1
property2
property3
rank
date

For a given tuple of properties, I get an array of records, which vary in rank and date. For example:

p1value_1, p2value_1, p3value_1 100, 2013-01-12
p1value_1, p2value_1, p3value_1 200, 2013-02-12
p1value_1, p2value_1, p3value_1, 75, 2013-03-12

For some next available set of properties, I get something different. For example:

p1value_1, p2value_1, p3value_2  30, 2013-01-12
p1value_1, p2value_1, p3value_2  15, 2013-02-12
p1value_1, p2value_1, p3value_2, 80, 2013-03-12

(notice a change from p3value_1 to p3value_2).

I need to figure out whether the change in rank between the start and end dates was positive or negative. A change is considered positive ("moved up") if the rank has lowered (i.e. the lower rank, the better). For an example above the change is calculated as such:

-(75-100) =  25 > 0  --  "moved up"
-(80-30)  = -50 < 0  --  "moved down"

The task is to calculate amount of all positive as well as all negative changes, for each available configuration (set) of properties.

Ideally, a final return result would look like this:

moved up | moved down
---------------------
      13 |         28

I figured I probably need to do the following steps:

turn all the records I'm interested into small groups (sets) like shown in example above,
sort each small group by date, pick first and last item as first and last,
decide what kind of change was that (moved up or down) by calculating
```
-(last-first)
```
– that'd be the value of a given small group,
count amount of "moved up" groups, as well as "moved down" groups and return the answer.

Any guidance, links or directions are appreciated. If something needs clarification, please just tell me and I'll do my best to clarify it.

Update. I'm using postgresql.

Hart CO · Accepted Answer

Syntax might be off a bit, but in postgresql you can use the FIRST_VALUE() and LAST_VALUE() functions, something like this:

SELECT  property1
       ,property2
       ,property3
       ,last_value(rank) over(partition by property1,property2,property3 order by date)
        - first_value(rank) over(partition by property1,property2,property3 order by date) AS Rank_Change
FROM YourTable

That would return the Rank_Change on every row, then you'd need to group those down to 1 per row and use a conditional SUM():

SELECT SUM(CASE WHEN Rank_Change < 0 THEN Rank_Change END) AS Total_Rank_Down

Update:

SELECT SUM(CASE WHEN Rank_Change < 0 THEN Rank_Change END) AS Total_Down
      ,SUM(CASE WHEN Rank_Change > 0 THEN Rank_Change END) AS Total_Up
FROM (
      SELECT search_engine,
             domain,
             location,
             MAX(Rank_Change) AS Rank_Change
      FROM (SELECT
             search_engine,
             domain,
             location,
             first_value(rank) over(partition BY search_engine, domain, location ORDER BY date DESC) -
             first_value(rank) over(partition BY search_engine, domain, location ORDER BY date) AS Rank_Change
            FROM ranks
            )AS Sub
      GROUP BY search_engine,
               domain,
               location
     )as SubSub

Last_Value actually is quirky, so you need to use first_value for both and change the ORDER BY to DESC for one of them.

Demo: SQL Fiddle

Advanced group by

Answers (1)

Related Questions