Daksh
Daksh

Reputation: 986

Combinational SQL count query in Postgres

I have some stations stored in a simple stations collection:

+----+-----------+
| id | name      |
+----+-----------+
| 1  | Station A |
+----+-----------+
| 2  | Station B |
+----+-----------+
| 3  | Station C |
+----+-----------+
| 4  | Station D |
+----+-----------+

And I have some rides stored in the rides collection:

+----+---------------+-------------+
| id | fromStationId | toStationId |
+----+---------------+-------------+
| 1  | 3             | 4           |
+----+---------------+-------------+
| 2  | 2             | 1           |
+----+---------------+-------------+
| 3  | 1             | 1           |
+----+---------------+-------------+
| 4  | 3             | 2           |
+----+---------------+-------------+

I would like to create a count list of all inter-station rides between all possible pairs of fromStation names and toStation names with the result looking like this:

[
  {
    "fromStation": "Station A",
    "toStation": "Station A",
    "count": 1196
  },
  {
    "fromStation": "Station A",
    "toStation": "Station B",
    "count": 1
  },
  {
    "fromStation": "Station A",
    "toStation": "Station C",
    "count": 173
  },
]

And so on for all other combinations...

How do I get all possible two-pair combinations of station names and then count the number of rides among them? I'm using the latest version of Postgres.

Upvotes: 0

Views: 94

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656754

Aggregate rides first, then resolve IDs to names:

SELECT f.name AS from_station, t.name AS to_station, count
FROM  (
   SELECT from_station_id, to_station_id, count(*) AS count
   FROM   rides
   GROUP  BY 1, 2
   ) r
JOIN   stations f ON f.id = r.from_station_id
JOIN   stations t ON t.id = r.to_station_id
ORDER  BY 1, 2;  -- optional order

Of course, that only produces combinations with actual rides. If you need to include combinations without any rides, you need an OUTER join to a Cartesian product of the stations table with itself. Something like:

-- include all combinations (even without rides)
SELECT from_station, to_station, COALESCE(count, 0) AS count
FROM  (
   SELECT from_station_id, to_station_id, count(*) AS count
   FROM   rides
   GROUP  BY 1, 2
   ) r
RIGHT  JOIN (
   SELECT f.id AS from_id, f.name AS from_station
        , t.id AS to_id  , t.name AS to_station
   FROM   stations f CROSS JOIN stations t
   ) s ON  s.from_id = r.from_station_id
      AND  s.to_id   = r.to_station_id
ORDER  BY 1, 2;  -- optional order

Again, it's cheaper to aggregate rides before joining to stations.

To wrap it up as JSON array or records, simply:

SELECT json_agg(sub)
FROM  (
   -- query from above
   ) sub;

db<>fiddle here

Upvotes: 1

S-Man
S-Man

Reputation: 23676

demos:db<>fiddle

SELECT 
    c.from_station,
    c.to_station,
    COUNT(*)
FROM stations s1
JOIN stations s2 ON s1.station <> s2.station                                     -- 1
JOIN connections c ON s1.station = c.from_station AND s2.station = c.to_station  -- 2
GROUP BY c.from_station, c.to_station                                            -- 3
  1. Create a self-join on the station table. The <> join condition ensures that same station will not be joined. All other station will be joined on each other, which creates all combinations
  2. Join this result on your connections table using both station columns from the self-join for the from and to points of the connections
  3. Now you can group by from and to and COUNT(*) aggregate this.

If you want to recognize cases where from station equals to station, you can change

JOIN stations s2 ON s1.station <> s2.station

into a simple:

CROSS JOIN stations s2

If you want to get a JSON object as you shown in the question:

SELECT 
    json_agg(connection)                      -- 2
FROM (
    SELECT 
        json_build_object(                    -- 1
            'fromStation', c.from_station,
            'toStation', c.to_station,
            'count', COUNT(*)
        ) as connection
    FROM stations s1
    JOIN stations s2 ON s1.station <> s2.station
    JOIN connections c ON s1.station = c.from_station AND s2.station = c.to_station
    GROUP BY c.from_station, c.to_station
) s
  1. Create your JSON object from your columns which you created above
  2. Aggregate them into a JSON array.

Upvotes: 1

Related Questions