s_Helle
s_Helle

Reputation: 21

How to count number of previous events and time differences in PostgreSQL?

I am new to SQL and I am communicating with a PostgreSQL database via queries. I have the following question: My (simplified) data table looks like this.

 DROP TABLE table;
    CREATE TABLE table(
       id   INTEGER  NOT NULL PRIMARY KEY 
      ,date DATE  NOT NULL
      ,key  BIT  NOT NULL
    );
    INSERT INTO table(id,date,key) VALUES (1,'18/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'20/02/05',1);
    INSERT INTO table(id,date,key) VALUES (1,'21/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'10/04/06',0);
    INSERT INTO table(id,date,key) VALUES (2,'09/05/08',0);
    INSERT INTO table(id,date,key) VALUES (2,'17/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'22/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'23/06/08',1);  
+----+------------+-----+
| id |    date    | key |
+----+------------+-----+
|  1 | 2005-02-18 |   0 |
|  1 | 2005-02-20 |   1 |
|  1 | 2005-02-21 |   0 |
|  1 | 2006-04-10 |   0 |
|  2 | 2008-05-09 |   0 |
|  2 | 2008-06-17 |   1 |
|  2 | 2008-06-22 |   1 |
|  2 | 2008-06-23 |   1 |
+----+------------+-----+ 

Where id identifies different groups in my data, date (formatted as date column) indicates the date a particular event occurred and key identifies important events in my data set. Now, I need to conduct the following tasks for each group of observations.

+----+------------+-----+--------+-----------+
| id |    date    | key | number | time_diff |
+----+------------+-----+--------+-----------+
|  1 | 2005-02-18 |   0 |      0 | NA        |
|  1 | 2005-02-20 |   1 |      0 | 0         |
|  1 | 2005-02-21 |   0 |      1 | 1         |
|  1 | 2006-04-10 |   0 |      0 | 413       |
|  2 | 2008-05-09 |   0 |      0 | NA        |
|  2 | 2008-06-17 |   1 |      0 | 0         |
|  2 | 2008-06-22 |   1 |      1 | 5         |
|  2 | 2008-06-23 |   1 |      2 | 1         |
+----+------------+-----+--------+-----------+ 

All events that occurred before the first key event in a particular group should be tagged as NULL or NA.

I tried to solve B with help of this blog but I am using Postgresql 9.3. and the FILTER clause is a feature of v9.4 if I am not mistaken.

My idea was to try the following:

         dataset <- dbGetQuery(channel, "SELECT t1.*, t1.date -
                               (
                                    SELECT MIN(t2.date)
                                    FROM table t2 
                                    WHERE t1.id = t2.id AND t2.key==1 
                                    AND t1.date-t2.date <= 7 AND t1.date-t2.date >= 0
                               ) AS time_diff FROM table t1 ORDER BY t1.id, t1.date")

But the result was not very satisfying, if there is more than one key event in my time window. I assume that I need to use a window function, specify my key events as FIRST_VALUE or set some kind of time interval but I have no clear idea how to implement this in order to reach the desired result. As you can see, I am using R to send the query to the DB.

Any help is appreciated. In case you need additional information, please let me know, especially since this is my first question on SO.

Upvotes: 2

Views: 825

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269445

Your question seems to be around "B" rather than "A".

You can do "B" with window functions, but it involves a conditional forward max scan on the date, rather than lag():

select t.*,
       (date -
        max(case when key = 1 then date end) over (partition by id order by date)
       ) as time_diff
from t;

Upvotes: 1

Related Questions