Reputation: 1301

Select latest values for group of related records

I have a table that accommodates data that is logically groupable by multiple properties (foreign key for example). Data is sequential over continuous time interval; i.e. it is a time series data. What I am trying to achieve is to select only latest values for each group of groups.

Here is example data:

+-----------------------------------------+
| code | value | date       | relation_id |
+-----------------------------------------+
| A    | 1     | 01.01.2016 | 1           |
| A    | 2     | 02.01.2016 | 1           |
| A    | 3     | 03.01.2016 | 1           |
| A    | 4     | 01.01.2016 | 2           |
| A    | 5     | 02.01.2016 | 2           |
| A    | 6     | 03.01.2016 | 2           |
| B    | 1     | 01.01.2016 | 1           |
| B    | 2     | 02.01.2016 | 1           |
| B    | 3     | 03.01.2016 | 1           |
| B    | 4     | 01.01.2016 | 2           |
| B    | 5     | 02.01.2016 | 2           |
| B    | 6     | 03.01.2016 | 2           |
+-----------------------------------------+

And here is example of desired output:

+-----------------------------------------+
| code | value | date       | relation_id |
+-----------------------------------------+
| A    | 3     | 03.01.2016 | 1           |
| A    | 6     | 03.01.2016 | 2           |
| B    | 3     | 03.01.2016 | 1           |
| B    | 6     | 03.01.2016 | 2           |
+-----------------------------------------+

To put this in perspective — for every related object I want to select each code with latest date.

Here is a select I came with. I've used ROW_NUMBER OVER (PARTITION BY...) approach:

SELECT indicators.code, indicators.dimension, indicators.unit, x.value, x.date, x.ticker, x.name
FROM (
  SELECT
  ROW_NUMBER() OVER (PARTITION BY indicator_id ORDER BY date DESC) AS r,
  t.indicator_id, t.value, t.date, t.company_id, companies.sic_id,
  companies.ticker, companies.name
  FROM fundamentals t
  INNER JOIN companies on companies.id = t.company_id
  WHERE companies.sic_id = 89
) x
INNER JOIN indicators on indicators.id = x.indicator_id
WHERE x.r <= (SELECT count(*) FROM companies where sic_id = 89)

It works but the problem is that it is painfully slow; when working with about 5% of production data which equals to roughly 3 million fundamentals records this select take about 10 seconds to finish. My guess is that happens due to subselect selecting huge amounts of records first.

Is there any way to speed this query up or am I digging in wrong direction trying to do it the way I do?

Upvotes: 0

Answers (4)

Nagaraj

Reputation: 231

I believe we can try something like this

   SELECT CODE,Relation_ID,Date,MAX(value)value FROM mytable

    GROUP BY CODE,Relation_ID,Date

Upvotes: 0

IceCreamSandwich

Reputation: 43

Other option:

SELECT DISTINCT Code,
Relation_ID,
FIRST_VALUE(Value) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Value,
FIRST_VALUE(Date) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Date
FROM mytable

This will return top value for what ever you partition by, and for whatever you order by.

Upvotes: 0

Gordon Linoff

Reputation: 1269463

Postgres offers the convenient distinct on for this purpose:

select distinct on (relation_id, code) t.*
from t
order by relation_id, code, date desc;

Upvotes: 1

Adam Martin

Reputation: 1218

So your query uses different column names than your sample data, so it's hard to tell, but it looks like you just want to group by everything except for date? Assuming you don't have multiple most recent dates, something like this should work. Basically don't use the window function, use a proper group by, and your engine should optimize the query better.

SELECT mytable.code,
       mytable.value,
       mytable.date,
       mytable.relation_id
  FROM mytable
  JOIN (
        SELECT code, 
               max(date) as date, 
               relation_id
          FROM mytable
      GROUP BY code, relation_id
       ) Q1
    ON Q1.code = mytable.code
   AND Q1.date = mytable.date
   AND Q1.relation_id = mytable.relation_id

Upvotes: 0

Select latest values for group of related records

Answers (4)

Related Questions