logjammin
logjammin

Reputation: 1211

In PostgreSQL, calculate the average duration (interval of time) that an ID appears in a single Date column

Background

I'm pretty new to SQL and to Postgres. I'm trying to get some basic descriptions of some data in a large table about car and motorcycle insurance claims -- call the table claims. The data have ~15 columns, but only 2 are relevant here:

     id        | claim_date   |
---------------+--------------|
 001           |  2011-01-30  |
 001           |  2012-04-12  |
 001           |  2014-06-27  |
 002           |  2017-09-03  |
 002           |  2018-12-25  |

The Problem

I'd like to know the average interval of time that IDs appear in this dataset. Were I to do this by hand, I'd subtract the earliest from the latest claim_date for each ID, sum them, and divide by the number of distinct IDs (2, in this case). This'd give me a result of 861 days here.

The main issue is that I'm not really sure how to write this out in query form in Postgres.

What I've Tried

In similar posts, other users have asked about subtracting the date in one date column from another to get a date interval, but that's not exactly what I'm after.

I've tried a query like this:

select max(claims.claim_date) - min(claims.claim_date) date_diff
from claims
group by ID

But only gets me part way, just a list of max-min intervals for each ID. I've also tried using AVG like so:

select AVG(max(claims.claim_date) - min(claims.claim_date))
from claims
group by ID

But here I get the error ERROR: aggregate function calls cannot be nested.

Any ideas?

Upvotes: 0

Views: 662

Answers (1)

sticky bit
sticky bit

Reputation: 37472

Just use two levels of aggregation.

SELECT avg(d) AS ad
       FROM (SELECT max(claim_date)
                    - min(claim_date) AS d
                    FROM claims
                    GROUP BY id) AS x;

Upvotes: 2

Related Questions