How to count number of previous events and time differences in PostgreSQL?

Question

I am new to SQL and I am communicating with a PostgreSQL database via queries. I have the following question: My (simplified) data table looks like this.

 DROP TABLE table;
    CREATE TABLE table(
       id   INTEGER  NOT NULL PRIMARY KEY 
      ,date DATE  NOT NULL
      ,key  BIT  NOT NULL
    );
    INSERT INTO table(id,date,key) VALUES (1,'18/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'20/02/05',1);
    INSERT INTO table(id,date,key) VALUES (1,'21/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'10/04/06',0);
    INSERT INTO table(id,date,key) VALUES (2,'09/05/08',0);
    INSERT INTO table(id,date,key) VALUES (2,'17/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'22/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'23/06/08',1);

+----+------------+-----+
| id |    date    | key |
+----+------------+-----+
|  1 | 2005-02-18 |   0 |
|  1 | 2005-02-20 |   1 |
|  1 | 2005-02-21 |   0 |
|  1 | 2006-04-10 |   0 |
|  2 | 2008-05-09 |   0 |
|  2 | 2008-06-17 |   1 |
|  2 | 2008-06-22 |   1 |
|  2 | 2008-06-23 |   1 |
+----+------------+-----+

Where id identifies different groups in my data, date (formatted as date column) indicates the date a particular event occurred and key identifies important events in my data set. Now, I need to conduct the following tasks for each group of observations.

A) Count the number of past key events in a particular time window for each date entry (let's say 7 days for the moment), in other words: for every date entry: How many times did a key event occur in the last 7 days (count key=1 for date-7 days) Comment: this is how it looks like in stata
B) Calculate the time difference in days between each event and the most recent key event, (date - last(date where key=1) =x. (ANSWERED, check out Gordon's post) The final result should look like this:

+----+------------+-----+--------+-----------+
| id |    date    | key | number | time_diff |
+----+------------+-----+--------+-----------+
|  1 | 2005-02-18 |   0 |      0 | NA        |
|  1 | 2005-02-20 |   1 |      0 | 0         |
|  1 | 2005-02-21 |   0 |      1 | 1         |
|  1 | 2006-04-10 |   0 |      0 | 413       |
|  2 | 2008-05-09 |   0 |      0 | NA        |
|  2 | 2008-06-17 |   1 |      0 | 0         |
|  2 | 2008-06-22 |   1 |      1 | 5         |
|  2 | 2008-06-23 |   1 |      2 | 1         |
+----+------------+-----+--------+-----------+

All events that occurred before the first key event in a particular group should be tagged as NULL or NA.

I tried to solve B with help of this blog but I am using Postgresql 9.3. and the FILTER clause is a feature of v9.4 if I am not mistaken.

My idea was to try the following:

         dataset <- dbGetQuery(channel, "SELECT t1.*, t1.date -
                               (
                                    SELECT MIN(t2.date)
                                    FROM table t2 
                                    WHERE t1.id = t2.id AND t2.key==1 
                                    AND t1.date-t2.date <= 7 AND t1.date-t2.date >= 0
                               ) AS time_diff FROM table t1 ORDER BY t1.id, t1.date")

But the result was not very satisfying, if there is more than one key event in my time window. I assume that I need to use a window function, specify my key events as FIRST_VALUE or set some kind of time interval but I have no clear idea how to implement this in order to reach the desired result. As you can see, I am using R to send the query to the DB.

Any help is appreciated. In case you need additional information, please let me know, especially since this is my first question on SO.

How to count number of previous events and time differences in PostgreSQL?

Answers (1)

Related Questions