Aman
Aman

Reputation: 353

Conditional sum on columns in postgresql

Subway data Columns:

  1. Station_id (1,2,3,4,5,6,7,8,1,2,3,4,5,1,2,3,4,5,6,7,1,2,3)
  2. Number of people boarded
  3. Number of people deboarded
  4. Occupancy

Occupancy at current station = Number of people already in the train at previous station + Number of people boarded - Number of people deboarded

I am trying to fill the occupancy column. The issue is that the dataset is for multiple subway trains so station_id changes back to 1 and on that station number of people deboarded is always 0 since it is the station from which train journey begins. I have got no clue on how to do this in postgresql. The occupancy column in the sample image below is empty and needs to be filled

The train journeys are sorted and grouped. enter image description here

Upvotes: 0

Views: 939

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269483

You can do this with the difference of the cumulative sums. The trick is identifying the groups, which I'll do by counting the number of times that station_id has been 1 up to that record.

select s.*,
       (sum(boarded) over (partition by grp order by id) -
        sum(deboarded) over (partition by grp order by id)
       ) as occupants
from (select s.*,
             count(*) filter (where station_id = 1) over (order by id) as grp
      from subwaydata s
     ) s;

Upvotes: 3

Related Questions